
``rfc822`` --- Parse RFC 2822 mail headers
******************************************

Deprecated since version 2.3: The ``email`` package should be used in
preference to the ``rfc822`` module.  This module is present only to
maintain backward compatibility, and has been removed in Python 3.

This module defines a class, ``Message``, which represents an "email
message" as defined by the Internet standard **RFC 2822**. [1]  Such
messages consist of a collection of message headers, and a message
body.  This module also defines a helper class ``AddressList`` for
parsing **RFC 2822** addresses.  Please refer to the RFC for
information on the specific syntax of **RFC 2822** messages.

The ``mailbox`` module provides classes  to read mailboxes produced by
various end-user mail programs.

class class rfc822.Message(file[, seekable])

   A ``Message`` instance is instantiated with an input object as
   parameter. Message relies only on the input object having a
   ``readline()`` method; in particular, ordinary file objects
   qualify.  Instantiation reads headers from the input object up to a
   delimiter line (normally a blank line) and stores them in the
   instance.  The message body, following the headers, is not
   consumed.

   This class can work with any input object that supports a
   ``readline()`` method.  If the input object has seek and tell
   capability, the ``rewindbody()`` method will work; also, illegal
   lines will be pushed back onto the input stream.  If the input
   object lacks seek but has an ``unread()`` method that can push back
   a line of input, ``Message`` will use that to push back illegal
   lines.  Thus this class can be used to parse messages coming from a
   buffered stream.

   The optional *seekable* argument is provided as a workaround for
   certain stdio libraries in which ``tell()`` discards buffered data
   before discovering that the ``lseek()`` system call doesn't work.
   For maximum portability, you should set the seekable argument to
   zero to prevent that initial ``tell()`` when passing in an
   unseekable object such as a file object created from a socket
   object.

   Input lines as read from the file may either be terminated by CR-LF
   or by a single linefeed; a terminating CR-LF is replaced by a
   single linefeed before the line is stored.

   All header matching is done independent of upper or lower case;
   e.g. ``m['From']``, ``m['from']`` and ``m['FROM']`` all yield the
   same result.

class class rfc822.AddressList(field)

   You may instantiate the ``AddressList`` helper class using a single
   string parameter, a comma-separated list of **RFC 2822** addresses
   to be parsed.  (The parameter ``None`` yields an empty list.)

rfc822.quote(str)

   Return a new string with backslashes in *str* replaced by two
   backslashes and double quotes replaced by backslash-double quote.

rfc822.unquote(str)

   Return a new string which is an *unquoted* version of *str*. If
   *str* ends and begins with double quotes, they are stripped off.
   Likewise if *str* ends and begins with angle brackets, they are
   stripped off.

rfc822.parseaddr(address)

   Parse *address*, which should be the value of some address-
   containing field such as *To* or *Cc*, into its constituent
   "realname" and "email address" parts. Returns a tuple of that
   information, unless the parse fails, in which case a 2-tuple
   ``(None, None)`` is returned.

rfc822.dump_address_pair(pair)

   The inverse of ``parseaddr()``, this takes a 2-tuple of the form
   ``(realname, email_address)`` and returns the string value suitable
   for a *To* or *Cc* header.  If the first element of *pair* is
   false, then the second element is returned unmodified.

rfc822.parsedate(date)

   Attempts to parse a date according to the rules in **RFC 2822**.
   however, some mailers don't follow that format as specified, so
   ``parsedate()`` tries to guess correctly in such cases.  *date* is
   a string containing an **RFC 2822** date, such as  ``'Mon, 20 Nov
   1995 19:12:08 -0500'``.  If it succeeds in parsing the date,
   ``parsedate()`` returns a 9-tuple that can be passed directly to
   ``time.mktime()``; otherwise ``None`` will be returned.  Note that
   indexes 6, 7, and 8 of the result tuple are not usable.

rfc822.parsedate_tz(date)

   Performs the same function as ``parsedate()``, but returns either
   ``None`` or a 10-tuple; the first 9 elements make up a tuple that
   can be passed directly to ``time.mktime()``, and the tenth is the
   offset of the date's timezone from UTC (which is the official term
   for Greenwich Mean Time).  (Note that the sign of the timezone
   offset is the opposite of the sign of the ``time.timezone``
   variable for the same timezone; the latter variable follows the
   POSIX standard while this module follows **RFC 2822**.)  If the
   input string has no timezone, the last element of the tuple
   returned is ``None``.  Note that indexes 6, 7, and 8 of the result
   tuple are not usable.

rfc822.mktime_tz(tuple)

   Turn a 10-tuple as returned by ``parsedate_tz()`` into a UTC
   timestamp.  If the timezone item in the tuple is ``None``, assume
   local time.  Minor deficiency: this first interprets the first 8
   elements as a local time and then compensates for the timezone
   difference; this may yield a slight error around daylight savings
   time switch dates.  Not enough to worry about for common use.

See also:

   Module ``email``
      Comprehensive email handling package; supersedes the ``rfc822``
      module.

   Module ``mailbox``
      Classes to read various mailbox formats produced  by end-user
      mail programs.

   Module ``mimetools``
      Subclass of ``rfc822.Message`` that handles MIME encoded
      messages.


Message Objects
===============

A ``Message`` instance has the following methods:

Message.rewindbody()

   Seek to the start of the message body.  This only works if the file
   object is seekable.

Message.isheader(line)

   Returns a line's canonicalized fieldname (the dictionary key that
   will be used to index it) if the line is a legal **RFC 2822**
   header; otherwise returns ``None`` (implying that parsing should
   stop here and the line be pushed back on the input stream).  It is
   sometimes useful to override this method in a subclass.

Message.islast(line)

   Return true if the given line is a delimiter on which Message
   should stop.  The delimiter line is consumed, and the file object's
   read location positioned immediately after it.  By default this
   method just checks that the line is blank, but you can override it
   in a subclass.

Message.iscomment(line)

   Return ``True`` if the given line should be ignored entirely, just
   skipped. By default this is a stub that always returns ``False``,
   but you can override it in a subclass.

Message.getallmatchingheaders(name)

   Return a list of lines consisting of all headers matching *name*,
   if any.  Each physical line, whether it is a continuation line or
   not, is a separate list item.  Return the empty list if no header
   matches *name*.

Message.getfirstmatchingheader(name)

   Return a list of lines comprising the first header matching *name*,
   and its continuation line(s), if any.  Return ``None`` if there is
   no header matching *name*.

Message.getrawheader(name)

   Return a single string consisting of the text after the colon in
   the first header matching *name*.  This includes leading
   whitespace, the trailing linefeed, and internal linefeeds and
   whitespace if there any continuation line(s) were present.  Return
   ``None`` if there is no header matching *name*.

Message.getheader(name[, default])

   Return a single string consisting of the last header matching
   *name*, but strip leading and trailing whitespace. Internal
   whitespace is not stripped.  The optional *default* argument can be
   used to specify a different default to be returned when there is no
   header matching *name*; it defaults to ``None``. This is the
   preferred way to get parsed headers.

Message.get(name[, default])

   An alias for ``getheader()``, to make the interface more compatible
   with regular dictionaries.

Message.getaddr(name)

   Return a pair ``(full name, email address)`` parsed from the string
   returned by ``getheader(name)``.  If no header matching *name*
   exists, return ``(None, None)``; otherwise both the full name and
   the address are (possibly empty) strings.

   Example: If *m*'s first *From* header contains the string
   ``'jack@cwi.nl (Jack Jansen)'``, then ``m.getaddr('From')`` will
   yield the pair ``('Jack Jansen', 'jack@cwi.nl')``. If the header
   contained ``'Jack Jansen <jack@cwi.nl>'`` instead, it would yield
   the exact same result.

Message.getaddrlist(name)

   This is similar to ``getaddr(list)``, but parses a header
   containing a list of email addresses (e.g. a *To* header) and
   returns a list of ``(full name, email address)`` pairs (even if
   there was only one address in the header). If there is no header
   matching *name*, return an empty list.

   If multiple headers exist that match the named header (e.g. if
   there are several *Cc* headers), all are parsed for addresses. Any
   continuation lines the named headers contain are also parsed.

Message.getdate(name)

   Retrieve a header using ``getheader()`` and parse it into a 9-tuple
   compatible with ``time.mktime()``; note that fields 6, 7, and 8
   are not usable.  If there is no header matching *name*, or it is
   unparsable, return ``None``.

   Date parsing appears to be a black art, and not all mailers adhere
   to the standard.  While it has been tested and found correct on a
   large collection of email from many sources, it is still possible
   that this function may occasionally yield an incorrect result.

Message.getdate_tz(name)

   Retrieve a header using ``getheader()`` and parse it into a
   10-tuple; the first 9 elements will make a tuple compatible with
   ``time.mktime()``, and the 10th is a number giving the offset of
   the date's timezone from UTC.  Note that fields 6, 7, and 8  are
   not usable.  Similarly to ``getdate()``, if there is no header
   matching *name*, or it is unparsable, return ``None``.

``Message`` instances also support a limited mapping interface. In
particular: ``m[name]`` is like ``m.getheader(name)`` but raises
``KeyError`` if there is no matching header; and ``len(m)``,
``m.get(name[, default])``, ``name in m``, ``m.keys()``,
``m.values()`` ``m.items()``, and ``m.setdefault(name[, default])``
act as expected, with the one difference that ``setdefault()`` uses an
empty string as the default value. ``Message`` instances also support
the mapping writable interface ``m[name] = value`` and ``del
m[name]``.  ``Message`` objects do not support the ``clear()``,
``copy()``, ``popitem()``, or ``update()`` methods of the mapping
interface.  (Support for ``get()`` and ``setdefault()`` was only added
in Python 2.2.)

Finally, ``Message`` instances have some public instance variables:

Message.headers

   A list containing the entire set of header lines, in the order in
   which they were read (except that setitem calls may disturb this
   order). Each line contains a trailing newline.  The blank line
   terminating the headers is not contained in the list.

Message.fp

   The file or file-like object passed at instantiation time.  This
   can be used to read the message content.

Message.unixfrom

   The Unix ``From`` line, if the message had one, or an empty string.
   This is needed to regenerate the message in some contexts, such as
   an ``mbox``-style mailbox file.


AddressList Objects
===================

An ``AddressList`` instance has the following methods:

AddressList.__len__()

   Return the number of addresses in the address list.

AddressList.__str__()

   Return a canonicalized string representation of the address list.
   Addresses are rendered in "name" <host@domain> form, comma-
   separated.

AddressList.__add__(alist)

   Return a new ``AddressList`` instance that contains all addresses
   in both ``AddressList`` operands, with duplicates removed (set
   union).

AddressList.__iadd__(alist)

   In-place version of ``__add__()``; turns this ``AddressList``
   instance into the union of itself and the right-hand instance,
   *alist*.

AddressList.__sub__(alist)

   Return a new ``AddressList`` instance that contains every address
   in the left-hand ``AddressList`` operand that is not present in the
   right-hand address operand (set difference).

AddressList.__isub__(alist)

   In-place version of ``__sub__()``, removing addresses in this list
   which are also in *alist*.

Finally, ``AddressList`` instances have one public instance variable:

AddressList.addresslist

   A list of tuple string pairs, one per address.  In each member, the
   first is the canonicalized name part, the second is the actual
   route-address (``'@'``-separated username-host.domain pair).

-[ Footnotes ]-

[1] This module originally conformed to **RFC 822**, hence the name.
    Since then, **RFC 2822** has been released as an update to **RFC
    822**.  This module should be considered **RFC 2822**-conformant,
    especially in cases where the syntax or semantics have changed
    since **RFC 822**.
