» robots.txt generator, make your own robots.txt |
![]() ![]() |
» robots.txt generator, make your own robots.txt |
May 23 2006, 07:12 AM
Post
#1
|
|
![]() 10 13 06 I made an error Group: Premium Posts: 3,191 Joined: 11-May 04 From: Beautiful South Florida Member No.: 184 |
I was surfing around and ran across this site. It allows you to add text into your html that disallows some of the spider features you might not want .
Also it will disallow those nasty email grabbers (some of them) Anybody every used it before ? I want to apply it to my most famous blog! But, I also don’t want it to kill any chances of keeping it on the search engines.. There are a few other sites it might be good for also .. Example code disallowing NASTY robots to spider your site # # robots.txt generated by www.1-hit.com's robot generator # Please, we do NOT allow nonauthorized robots any longer. # User-agent: asterias Disallow: / User-agent: BackDoorBot/1.0 Disallow: / User-agent: Black Hole Disallow: / User-agent: BlowFish/1.0 Disallow: / User-agent: BotALot Disallow: / User-agent: BuiltBotTough Disallow: / User-agent: Bullseye/1.0 Disallow: / User-agent: BunnySlippers Disallow: / User-agent: Cegbfeieh Disallow: / User-agent: CheeseBot Disallow: / User-agent: CherryPicker Disallow: / User-agent: CherryPickerElite/1.0 Disallow: / User-agent: CherryPickerSE/1.0 Disallow: / User-agent: CopyRightCheck -------------------- |
|
|
|
May 23 2006, 07:21 AM
Post
#2
|
|
|
Banned ![]() ![]() ![]() ![]() ![]() Group: Banned Posts: 6,648 Joined: 18-March 04 From: Alabama Member No.: 43 |
I don't see how that would kill search engines like Google, Yahoo, MSN, etc... since their bots aren't even listed.
|
|
|
|
May 23 2006, 07:26 AM
Post
#3
|
|
![]() 10 13 06 I made an error Group: Premium Posts: 3,191 Joined: 11-May 04 From: Beautiful South Florida Member No.: 184 |
I don't see how that would kill search engines like Google, Yahoo, MSN, etc... since their bots aren't even listed. So then you like it? You aggree with using it? -------------------- |
|
|
|
May 23 2006, 07:32 AM
Post
#4
|
|
![]() Member ![]() Group: Members Posts: 54 Joined: 18-April 06 Member No.: 1,227 |
Just remember, your robots.txt file is a suggestion to bots that just asks, "please don't look at this part of my web site!" The really nasty robots won't check robots.txt, simply ignore it, or worse -- concentrate on the areas that are off-limits.
BTW - why does the web site list Mozilla and Wget as nasty bots? QUOTE User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95) User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 98) User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 2000) User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows ME) User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows XP) User-agent: Wget/1.6 This post has been edited by Moldz: May 23 2006, 07:33 AM |
|
|
|
May 23 2006, 07:44 AM
Post
#5
|
|
![]() 10 13 06 I made an error Group: Premium Posts: 3,191 Joined: 11-May 04 From: Beautiful South Florida Member No.: 184 |
Just remember, your robots.txt file is a suggestion to bots that just asks, "please don't look at this part of my web site!" The really nasty robots won't check robots.txt, simply ignore it, or worse -- concentrate on the areas that are off-limits. BTW - why does the web site list Mozilla and Wget as nasty bots? I dont know why it listed Mozilla and Wget ? Seems like those are kind of passive folks -------------------- |
|
|
|
May 23 2006, 10:21 AM
Post
#6
|
|
|
Lots of free time ![]() ![]() ![]() ![]() Group: Members Posts: 1,194 Joined: 10-February 05 From: LA Member No.: 463 |
I read an interesting article sometime last year or so about punishing the spiders that didn't obey the robots file. They basically made a tarpit with some code that bogged the spider down, making it think that it was caching thousands of files instead of the handful that kept renaming themselves. That area of the site was then added to the robots file and the good bots would avoid it while the bad bots would get stuck there for hours. I think somebody here may have done that to their site.
-------------------- We've all heard that a million monkeys banging on a million typewriters will eventually reproduce the entire works of Shakespeare. Now, thanks to the Internet, we know this is not true.
|
|
|
|
May 23 2006, 10:29 AM
Post
#7
|
|
|
Banned ![]() ![]() ![]() ![]() ![]() Group: Banned Posts: 6,648 Joined: 18-March 04 From: Alabama Member No.: 43 |
I read an interesting article sometime last year or so about punishing the spiders that didn't obey the robots file. They basically made a tarpit with some code that bogged the spider down, making it think that it was caching thousands of files instead of the handful that kept renaming themselves. That area of the site was then added to the robots file and the good bots would avoid it while the bad bots would get stuck there for hours. I think somebody here may have done that to their site. http://computerhelpforum.org/forum/programming/f46/http_computerhelpforum_org_forum_index_php_showtopic_6186/t6186.html |
|
|
|
May 23 2006, 10:43 AM
Post
#8
|
|
|
Lots of free time ![]() ![]() ![]() ![]() Group: Members Posts: 1,194 Joined: 10-February 05 From: LA Member No.: 463 |
That's not what I'm talking about, but it's kinda close. That's the "generate fake files to screw with the RIAA" thing. The one I can remember tried to keep the spider in an endless loop.
-------------------- We've all heard that a million monkeys banging on a million typewriters will eventually reproduce the entire works of Shakespeare. Now, thanks to the Internet, we know this is not true.
|
|
|
|
![]() ![]() ![]() |
Lo-Fi Version |
Time is now: 29th July 2010 - 03:46 PM Skin by IPB FR - IPB Europe |