PradoEversole738

From eplmediawiki
Jump to: navigation, search

Sometimes, we may want search engines to not list certain parts of the site, and on occasion even ban other SE from the site completely.

Where a simple, small 2 line text file called robots.txt is available in that is.

Once we have a web site up and running, we need certainly to be sure that all visiting search-engines can access all the pages we want them to look at.

Sometimes, we might want search engines never to index certain areas of the site, and on occasion even ban other SE from the site altogether.

Where a simple, small 2 line text file called robots.txt will come in this really is.

Robots.txt exists in your web sites main directory (on LINUX systems this really is your /public_html/ directory), and looks something like the following:

User-agent: *

Disallow:

The first line controls the robot that'll be visiting your site, the next line controls if they're allowed in, or which areas of the site they're not allowed to see

If you prefer to handle multiple spiders, then the above lines are repeated by simple.

Therefore an example:

User-agent: googlebot

Disallow:

User-agent: askjeeves

Disallow: /

This can allow Goggle (user-agent name GoogleBot) to visit every page and service, while at the same time frame banning Ask Jeeves from the site completely.

To get a fairly updated list of robot person names this visit if you'd like allowing every robot to index every page of your site, its still very advisable to put a robots.txt file on your own site. It will end your problem records filling up with records from search-engines wanting to access your robots.txt file that doesnt exist.

For more information on robots.txt see, the full set of sources about robots.txt at investigate pool tile cleaning

Personal tools
Namespaces

Variants
Actions
Navigation
extras
Toolbox