You're one of our intelligent native English speakers and I'm sure you'll have no problem to understand what is
robots.txt. It's a good question. Important point is the new webmasters to remember that it's not 'robot' (singular), it's 'robots' (plural), so your spelling way of the
robots.txt is right.
This robots.txt thing is the robots exclusion standard (a.k.a. the robots exclusion protocol). This standard is used by webmasters to make their websites able to communicate with web crawlers and other web robots (a note: not every web robot is a web crawler /a.k.a. 'spider'/). Robots.txt specifies how to inform the web robots about which areas of a certain website should be and/or not be processed/scanned. Robots.txt is often used by SEs (i.e. search engines) to categorize web sites. It's good, but... not all robots cooperate with the standard -- e-mail harvesters, spambots and malware robots may even start with the portions of the website where they have been told to stay out! As far as you mentioned the sitemap.xml, well the robots.txt is different from, but can be used in conjunction with it. So, as you already understand -- it (robots.txt) is effective, but not 100% effective and, I think, that this is one of the first things you should know about it: yes, you may choose to outline rules on the behavior of internet bots by implementing a robots.txt file (and this file is simply text stating the rules governing a bot's behavior)... Any bot interacting with (or 'spidering') any server that does not follow these rules should, in theory, be denied access to, or removed from, the affected website. If the only rule implementation by a server is a posted text file with no associated program/software/app, then adhering to those rules is entirely voluntary -- in reality there is no way to enforce those rules, or even to ensure that a bot's creator or implementer acknowledges, or even reads, the robots.txt files' contents.
If you worry much about this, maybe you may learn more about the so called '
spider trap':
https://en.wikipedia.org/wiki/Spider_trap And now the examples:
User-agent: *
Allow:
(This tells all robots that they can visit all files because the * specifies all robots.)
User-agent: *
Disallow: /
(The opposite -- it tells all robots to stay out of a website.)
User-agent: *
Disallow: /cgi-bin/
(It tells all robots not to enter a /one/ directory.)
User-agent: *
Disallow: /directory/ourfile.html
(It tells all robots to stay away from a /one/ specific file; all other files in the specified directory will be processed.)
In case you need more examples, just let me know.