
``urllib.robotparser`` ---  Parser for robots.txt
*************************************************

This module provides a single class, ``RobotFileParser``, which
answers questions about whether or not a particular user agent can
fetch a URL on the Web site that published the ``robots.txt`` file.
For more details on the structure of ``robots.txt`` files, see
http://www.robotstxt.org/orig.html.

class class urllib.robotparser.RobotFileParser(url='')

   This class provides methods to read, parse and answer questions
   about the ``robots.txt`` file at *url*.

   set_url(url)

      Sets the URL referring to a ``robots.txt`` file.

   read()

      Reads the ``robots.txt`` URL and feeds it to the parser.

   parse(lines)

      Parses the lines argument.

   can_fetch(useragent, url)

      Returns ``True`` if the *useragent* is allowed to fetch the
      *url* according to the rules contained in the parsed
      ``robots.txt`` file.

   mtime()

      Returns the time the ``robots.txt`` file was last fetched.  This
      is useful for long-running web spiders that need to check for
      new ``robots.txt`` files periodically.

   modified()

      Sets the time the ``robots.txt`` file was last fetched to the
      current time.

The following example demonstrates basic use of the RobotFileParser
class.

>>> import urllib.robotparser
>>> rp = urllib.robotparser.RobotFileParser()
>>> rp.set_url("http://www.musi-cal.com/robots.txt")
>>> rp.read()
>>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
False
>>> rp.can_fetch("*", "http://www.musi-cal.com/")
True
