Robots.txt
Robots.txt
I have a stupid question.
If a website is running under php, which folders is allowed for the robots?
i mean the structure of the side is now different to normal html sites where the info is stored in the info folder.
But if i use the structure RECIPEs for example - there is no folder recipes in my root...
If a website is running under php, which folders is allowed for the robots?
i mean the structure of the side is now different to normal html sites where the info is stored in the info folder.
But if i use the structure RECIPEs for example - there is no folder recipes in my root...

http://blablabla.de and look for it

This way I found one
Code: Select all
# robots.txt for http://www.philipp-XXXXXX.de/ # file created: 08.08.01 User-agent:
* # Disallow: /cgi-bin/ # exclude robots from specified tree # Disallow: /scripts/
Kompetenz in Präzisionsgewindespindeln, Fein- und Trapezgewindespindeln, gewindeschleifen,
Praezisionsgewindespindeln, Gewindespindeln, Feingewindespindeln, Trapezgewindespindeln,
Trapezgewindestangen, Feingewindetriebe, Trapezgewindetriebe, Gewindekerne, Gewinderollen,
rundschleifen, Schnittwerkzeuge, Präzisionsdrehteile, Praezisionsdrehteile, Präsionsfrästeile,
Praezisionsfraesteile, Werkzeugfertigung, drehen, fräsen, fraesen, aussenrundschleifen,
innenrundschleifen, flachschleifen, spitzenloses, schleifen, schneckenschleifen,
honen, Maschinen, Werkzeugbau.

All my sites without robots.txt, one top, rest good ranking
greetz
Jürgen the robot


the robots.txt file is a very good way of directing search engines...
eg google relies heavily on this to be told where to go... as many search engines also obey simple commands telling them what directories to ignore as well as how long to spider pages for...
robots.txt can become very indepth...
TriP
eg google relies heavily on this to be told where to go... as many search engines also obey simple commands telling them what directories to ignore as well as how long to spider pages for...
robots.txt can become very indepth...
TriP
Thank you for the answers. I see a light in the dark.
That the robots.txt is for directing search engines is easy for me to understand....
O.K. for google you may say disallow for the cgi-bin or other directories.
But DF6IH writes that he is running his sites without robots.txt.
That means that the spiders are spidering every folder. Thats right? Or even not because they have no order to crawl anything.
And if we use this for phpwcms for example, the spiders are crawling every folder (allow *), but there are just the php-files and no html-like content.
So what's the intention? Are the spiders "seeing" the php website like we do? Means the spiders are seeing just the content?
For example - there is no use for spiders to crawling the FCKEditor subfolders...
And what means the words in the robot.txt files ? DF6IH writes:
Kompetenz in Präzisionsgewindespindeln, Fein- und Trapezgewindespindeln, gewindeschleifen,
Are these some keywords of the content? So maybe i should bring them in my r*.txt file, too.
my robots.txt file looks like:
If I understand you, i should change it to allow all folders but cgi-bin, config..
Thank you for teaching me. But you see, if i ask a question, there are many more coming up after i read your answers.
The site http://www.blahblahblah.de you told before is an internet site from a moderator.
Anyway, I wish yo all a happy christmas...
[/code][/quote]

That the robots.txt is for directing search engines is easy for me to understand....
O.K. for google you may say disallow for the cgi-bin or other directories.
But DF6IH writes that he is running his sites without robots.txt.
That means that the spiders are spidering every folder. Thats right? Or even not because they have no order to crawl anything.
And if we use this for phpwcms for example, the spiders are crawling every folder (allow *), but there are just the php-files and no html-like content.
So what's the intention? Are the spiders "seeing" the php website like we do? Means the spiders are seeing just the content?
For example - there is no use for spiders to crawling the FCKEditor subfolders...
And what means the words in the robot.txt files ? DF6IH writes:
Kompetenz in Präzisionsgewindespindeln, Fein- und Trapezgewindespindeln, gewindeschleifen,
Are these some keywords of the content? So maybe i should bring them in my r*.txt file, too.
my robots.txt file looks like:
Code: Select all
User-agent:*
Disallow: /cgi-bin/
Disallow: /logs/
Disallow: /config/
Disallow: /include/
Disallow: /img/
Disallow: /phpwcms_ftp/
Disallow: /picture/
Disallow: /phpwcms_code_snippets/
Thank you for teaching me. But you see, if i ask a question, there are many more coming up after i read your answers.

The site http://www.blahblahblah.de you told before is an internet site from a moderator.
Anyway, I wish yo all a happy christmas...

Last edited by Buletti on Fri 23. Dec 2005, 12:38, edited 1 time in total.
the robots.txt is a good informationfile to look for some directories who may be have some interresting content in it to for some script kiddiez to "hack" the dir :)
look at the ms robots hehe
http://www.microsoft.com/robots.txt
look at the ms robots hehe
http://www.microsoft.com/robots.txt
or this one : http://www.whitehouse.gov/robots.txtPhadda wrote:the robots.txt is a good informationfile to look for some directories who may be have some interresting content in it to for some script kiddiez to "hack" the dir
look at the ms robots hehe
http://www.microsoft.com/robots.txt
Looks like they dont want bad credit (look at all the iraq link :S) - there you can talk about censur!! Shame on you Bush!!
http://www.studmed.dk Portal for doctors and medical students in Denmark
here is another example of what a robots.txt file can do
basically its telling msn spiders to stay away from the cgi bin
crawl delay for msn bot
ask google images not to index images
basically anything you do not want to be spidered needs to be in the robots.txt file
so from the above you can see it can be used for a lot
TriP
Code: Select all
User-agent: msnbot
Disallow: /cgi-bin
Code: Select all
User-Agent: MSNbot
Crawl-Delay: 20
Code: Select all
User-agent: Googlebot-Image
Disallow: /images
basically anything you do not want to be spidered needs to be in the robots.txt file
so from the above you can see it can be used for a lot
TriP
I too would be interested to know if anyone has anyone scored better with the search engines by creating a robots.txt file excluding /phpwcms_filestorage/ etc., or are the results the same using no robots.txt file at all? 
In my tests using mod_rewrite, a PHPWCMS site does (eventually) get deep-indexed by Google without robots.txt.

In my tests using mod_rewrite, a PHPWCMS site does (eventually) get deep-indexed by Google without robots.txt.