Welcome Guest ( Log In | Register )

IPB

 
RépondreNouveau sujet
» robots.txt generator, make your own robots.txt
sligh
post May 23 2006, 07:12 AM
Post #1


10 13 06 I made an error
Group Icon

Group: Premium
Posts: 3,191
Joined: 11-May 04
From: Beautiful South Florida
Member No.: 184




I was surfing around and ran across this site. It allows you to add text into your html that disallows some of the spider features you might not want .
Also it will disallow those nasty email grabbers (some of them)
Anybody every used it before ? I want to apply it to my most famous blog! But, I also don’t want it to kill any chances of keeping it on the search engines.. There are a few other sites it might be good for also ..




Example code disallowing NASTY robots to spider your site

#
# robots.txt generated by www.1-hit.com's robot generator
# Please, we do NOT allow nonauthorized robots any longer.
#

User-agent: asterias
Disallow: /
User-agent: BackDoorBot/1.0
Disallow: /
User-agent: Black Hole
Disallow: /
User-agent: BlowFish/1.0
Disallow: /
User-agent: BotALot
Disallow: /
User-agent: BuiltBotTough
Disallow: /
User-agent: Bullseye/1.0
Disallow: /
User-agent: BunnySlippers
Disallow: /
User-agent: Cegbfeieh
Disallow: /
User-agent: CheeseBot
Disallow: /
User-agent: CherryPicker
Disallow: /
User-agent: CherryPickerElite/1.0
Disallow: /
User-agent: CherryPickerSE/1.0
Disallow: /
User-agent: CopyRightCheck


--------------------


Go to the top of the page
 
+Quote Post
CypherXero
post May 23 2006, 07:21 AM
Post #2


Banned
*****

Group: Banned
Posts: 6,648
Joined: 18-March 04
From: Alabama
Member No.: 43




I don't see how that would kill search engines like Google, Yahoo, MSN, etc... since their bots aren't even listed.
Go to the top of the page
 
+Quote Post
sligh
post May 23 2006, 07:26 AM
Post #3


10 13 06 I made an error
Group Icon

Group: Premium
Posts: 3,191
Joined: 11-May 04
From: Beautiful South Florida
Member No.: 184




QUOTE (CypherXero @ May 23 2006, 09:21 AM) *
I don't see how that would kill search engines like Google, Yahoo, MSN, etc... since their bots aren't even listed.


So then you like it? You aggree with using it?


--------------------


Go to the top of the page
 
+Quote Post
Moldz
post May 23 2006, 07:32 AM
Post #4


Member
*

Group: Members
Posts: 54
Joined: 18-April 06
Member No.: 1,227




Just remember, your robots.txt file is a suggestion to bots that just asks, "please don't look at this part of my web site!" The really nasty robots won't check robots.txt, simply ignore it, or worse -- concentrate on the areas that are off-limits.

BTW - why does the web site list Mozilla and Wget as nasty bots?
QUOTE
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 98)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 2000)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows ME)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows XP)
User-agent: Wget/1.6


This post has been edited by Moldz: May 23 2006, 07:33 AM
Go to the top of the page
 
+Quote Post
sligh
post May 23 2006, 07:44 AM
Post #5


10 13 06 I made an error
Group Icon

Group: Premium
Posts: 3,191
Joined: 11-May 04
From: Beautiful South Florida
Member No.: 184




QUOTE (Moldz @ May 23 2006, 09:32 AM) *
Just remember, your robots.txt file is a suggestion to bots that just asks, "please don't look at this part of my web site!" The really nasty robots won't check robots.txt, simply ignore it, or worse -- concentrate on the areas that are off-limits.

BTW - why does the web site list Mozilla and Wget as nasty bots?

I dont know why it listed Mozilla and Wget ? Seems like those are kind of passive folks


--------------------


Go to the top of the page
 
+Quote Post
Watermark
post May 23 2006, 10:21 AM
Post #6


Lots of free time
****

Group: Members
Posts: 1,194
Joined: 10-February 05
From: LA
Member No.: 463




I read an interesting article sometime last year or so about punishing the spiders that didn't obey the robots file. They basically made a tarpit with some code that bogged the spider down, making it think that it was caching thousands of files instead of the handful that kept renaming themselves. That area of the site was then added to the robots file and the good bots would avoid it while the bad bots would get stuck there for hours. I think somebody here may have done that to their site.


--------------------
We've all heard that a million monkeys banging on a million typewriters will eventually reproduce the entire works of Shakespeare. Now, thanks to the Internet, we know this is not true.
Go to the top of the page
 
+Quote Post
CypherXero
post May 23 2006, 10:29 AM
Post #7


Banned
*****

Group: Banned
Posts: 6,648
Joined: 18-March 04
From: Alabama
Member No.: 43




QUOTE (Watermark @ May 23 2006, 11:21 AM) *
I read an interesting article sometime last year or so about punishing the spiders that didn't obey the robots file. They basically made a tarpit with some code that bogged the spider down, making it think that it was caching thousands of files instead of the handful that kept renaming themselves. That area of the site was then added to the robots file and the good bots would avoid it while the bad bots would get stuck there for hours. I think somebody here may have done that to their site.


http://computerhelpforum.org/forum/programming/f46/http_computerhelpforum_org_forum_index_php_showtopic_6186/t6186.html
Go to the top of the page
 
+Quote Post
Watermark
post May 23 2006, 10:43 AM
Post #8


Lots of free time
****

Group: Members
Posts: 1,194
Joined: 10-February 05
From: LA
Member No.: 463




That's not what I'm talking about, but it's kinda close. That's the "generate fake files to screw with the RIAA" thing. The one I can remember tried to keep the spider in an endless loop.


--------------------
We've all heard that a million monkeys banging on a million typewriters will eventually reproduce the entire works of Shakespeare. Now, thanks to the Internet, we know this is not true.
Go to the top of the page
 
+Quote Post

Reponse rapideRépondreNouveau sujet
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 




Lo-Fi Version
Time is now: 29th July 2010 - 03:46 PM
Skin by IPB FR - IPB Europe