Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A14E29AB6 for ; Wed, 21 Sep 2011 08:29:33 +0000 (UTC) Received: (qmail 56473 invoked by uid 500); 21 Sep 2011 08:29:32 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 56408 invoked by uid 500); 21 Sep 2011 08:29:32 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 56379 invoked by uid 99); 21 Sep 2011 08:29:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Sep 2011 08:29:31 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Sep 2011 08:29:29 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 58AECA6FF1 for ; Wed, 21 Sep 2011 08:29:09 +0000 (UTC) Date: Wed, 21 Sep 2011 08:29:09 +0000 (UTC) From: "Mark Dickensob (JIRA)" To: dev@lucene.apache.org Message-ID: <683120474.49855.1316593749359.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1400807836.49587.1316584149155.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Issue Comment Edited] (SOLR-2787) add external http: include file reference for .htaccess processing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109297#comment-13109297 ] Mark Dickensob edited comment on SOLR-2787 at 9/21/11 8:28 AM: --------------------------------------------------------------- Message Yes it is a goal!!! Obviously you dont run a big Apache site (no offence) here is the list of bad bots i have so far in .htaccess I can make this a file available for apache server users. If I am in the wrong group let me know where I can lodge this request PLEASE! # Kill bad bots # RewriteCond %{HTTP_USER_AGENT} ^Web-sniffer/1 [OR] RewriteCond %{HTTP_REFERER} ^AEE- [OR] RewriteCond %{HTTP_USER_AGENT} ^Apache-HttpClient [OR] RewriteCond %{HTTP_USER_AGENT} ^Atomic_Email_Hunter [OR] RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR] RewriteCond %{HTTP_USER_AGENT} ^CakePHP [OR] RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] RewriteCond %{HTTP_USER_AGENT} ^BDFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] RewriteCond %{HTTP_USER_AGENT} ^DomainWatcher [OR] RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR] RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^EMail\ Exractor [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] RewriteCond %{HTTP_USER_AGENT} ^Fetch [OR] RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] RewriteCond %{HTTP_USER_AGENT} ^Gigabot [OR] RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] RewriteCond %{HTTP_USER_AGENT} ^Huawei [OR] RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] RewriteCond %{HTTP_USER_AGENT} ^IlTrovatore [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^Infoseek\ SideWinder [OR] RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR] RewriteCond %{HTTP_USER_AGENT} ^Jakarta [OR] RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] RewriteCond %{HTTP_USER_AGENT} ^jikespider [OR] RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL [OR] RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR] RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR] RewriteCond %{HTTP_USER_AGENT} ^Mozilla-Firefox-Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^MyApp [OR] RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR] RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Nimo\ Software [OR] RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR] RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR] RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [OR] RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] RewriteCond %{HTTP_USER_AGENT} ^swish-e [OR] RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR] RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR] RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR] RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Xenu\ Link\ Sleuth [OR] RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR] RewriteCond %{HTTP_USER_AGENT} DigExt [NC,OR] RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR] RewriteCond %{HTTP_USER_AGENT} Java/1 [NC,OR] RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR] RewriteCond %{HTTP_USER_AGENT} Purebot [NC,OR] RewriteCond %{HTTP_USER_AGENT} SiteBot [NC,OR] RewriteCond %{HTTP_USER_AGENT} TalkTalk [NC,OR] RewriteCond %{HTTP_USER_AGENT} Webster [NC,OR] RewriteCond %{HTTP_USER_AGENT} WinHttp [NC] # RewriteCond %{HTTP_USER_AGENT} ^$ RewriteRule ^.* 403.shtml RewriteCond %{REMOTE_HOST} burningredirect\.com [NC,OR] RewriteCond %{REMOTE_HOST} hostnoc\.net [NC,OR] RewriteCond %{REMOTE_HOST} trygoclio.com\.com [NC,OR] RewriteCond %{HTTP_REFERER} http://s.nsdsvc\.com [NC,OR] RewriteCond %{HTTP_REFERER} http://djstools\.com [NC] was (Author: goan69): Message Yes it is a goal!!! Obviously you dont run a big Apache site (no offence) here is the list of bad bots i have so far in .htaccess I can make this a file available for apache server users. If I am in the wromg group let me know where I can lodge this request PLEASE! # Kill bad bots # RewriteCond %{HTTP_USER_AGENT} ^Web-sniffer/1 [OR] RewriteCond %{HTTP_REFERER} ^AEE- [OR] RewriteCond %{HTTP_USER_AGENT} ^Apache-HttpClient [OR] RewriteCond %{HTTP_USER_AGENT} ^Atomic_Email_Hunter [OR] RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR] RewriteCond %{HTTP_USER_AGENT} ^CakePHP [OR] RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] RewriteCond %{HTTP_USER_AGENT} ^BDFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] RewriteCond %{HTTP_USER_AGENT} ^DomainWatcher [OR] RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR] RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^EMail\ Exractor [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] RewriteCond %{HTTP_USER_AGENT} ^Fetch [OR] RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] RewriteCond %{HTTP_USER_AGENT} ^Gigabot [OR] RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] RewriteCond %{HTTP_USER_AGENT} ^Huawei [OR] RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] RewriteCond %{HTTP_USER_AGENT} ^IlTrovatore [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^Infoseek\ SideWinder [OR] RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR] RewriteCond %{HTTP_USER_AGENT} ^Jakarta [OR] RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] RewriteCond %{HTTP_USER_AGENT} ^jikespider [OR] RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL [OR] RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR] RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR] RewriteCond %{HTTP_USER_AGENT} ^Mozilla-Firefox-Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^MyApp [OR] RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR] RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Nimo\ Software [OR] RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR] RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR] RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [OR] RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] RewriteCond %{HTTP_USER_AGENT} ^swish-e [OR] RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR] RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR] RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR] RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Xenu\ Link\ Sleuth [OR] RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR] RewriteCond %{HTTP_USER_AGENT} DigExt [NC,OR] RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR] RewriteCond %{HTTP_USER_AGENT} Java/1 [NC,OR] RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR] RewriteCond %{HTTP_USER_AGENT} Purebot [NC,OR] RewriteCond %{HTTP_USER_AGENT} SiteBot [NC,OR] RewriteCond %{HTTP_USER_AGENT} TalkTalk [NC,OR] RewriteCond %{HTTP_USER_AGENT} Webster [NC,OR] RewriteCond %{HTTP_USER_AGENT} WinHttp [NC] # RewriteCond %{HTTP_USER_AGENT} ^$ RewriteRule ^.* 403.shtml RewriteCond %{REMOTE_HOST} burningredirect\.com [NC,OR] RewriteCond %{REMOTE_HOST} hostnoc\.net [NC,OR] RewriteCond %{REMOTE_HOST} trygoclio.com\.com [NC,OR] RewriteCond %{HTTP_REFERER} http://s.nsdsvc\.com [NC,OR] RewriteCond %{HTTP_REFERER} http://djstools\.com [NC] > add external http: include file reference for .htaccess processing > ------------------------------------------------------------------ > > Key: SOLR-2787 > URL: https://issues.apache.org/jira/browse/SOLR-2787 > Project: Solr > Issue Type: Improvement > Components: update > Affects Versions: 3.4 > Environment: All operating systems > Reporter: Mark Dickensob > Labels: Spam, killer > Original Estimate: 504h > Remaining Estimate: 504h > > Include an external link directive to an external http: file that supplies a (.htaccess compatible) list of known bad bot sites. > ie common resource for spam kill list site(s) > Personally, I run a portal and I think that this feature is important to kill spam! > I will supply the files for testing if you need them. > Mark goan.com -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org