httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Morgan Gangwere" <0.fracta...@gmail.com>
Subject [users@httpd] Re: Please help... apache hacked?
Date Sun, 23 Jul 2006 20:07:24 GMT
if you read the other posts, I made that same mistake. apparently
bots.txt is for IRC bots... there is no wikipedia page... hmmmm....

On 7/16/06, Hex Star <hexstar@gmail.com> wrote:
> perhaps you meant robots.txt?
>
> Robots Exclusion Standard From Wikipedia, the free encyclopedia (Redirected
> from
> Robots.txt<http://en.wikipedia.org/w/index.php?title=Robots.txt&redirect=no>
> )
> Jump to: navigation <http://en.wikipedia.org/wiki/Robots.txt#column-one>,
> search <http://en.wikipedia.org/wiki/Robots.txt#searchInput>
>
> The *robots exclusion standard* or *robots.txt protocol* is a convention to
> prevent cooperating web spiders
> <http://en.wikipedia.org/wiki/Web_spider>and other web
> robots <http://en.wikipedia.org/wiki/Web_robot> from accessing all or part
> of a website <http://en.wikipedia.org/wiki/Website>. The information
> specifying the parts that should not be accessed is specified in a file
> called *robots.txt* in the top-level directory of the website.
>
> The robots.txt protocol was created by
> consensus<http://www.robotstxt.org/wc/norobots.html>in June 1994 by
> members of the robots mailing list (
> robots-request@nexor.co.uk). There is no official standards body or
> RFC<http://en.wikipedia.org/wiki/Request_for_Comments>for the
> protocol.
>
> The protocol is purely advisory. It relies on the cooperation of the web
> robot, so that marking an area of your site out of bounds with
> robots.txtdoes not guarantee privacy. Many web site administrators
> have been caught
> trying to use the robots file to make private parts of a website invisible
> to the rest of the world. However, the file is necessarily publicly
> available and is easily checked by anyone with a web browser.
>
> The robots.txt patterns are matched by simple substring comparisons, so care
> should be taken to make sure that patterns matching directories have the
> final '/' character appended: otherwise all files with names starting with
> that substring will match, rather than just those in the directory intended.
>   Contents [hide <javascript:toggleToc()>]
>
>    - 1 Examples <http://en.wikipedia.org/wiki/Robots.txt#Examples>
>       - 1.1
> Compatibility<http://en.wikipedia.org/wiki/Robots.txt#Compatibility>
>    - 2 Alternatives<http://en.wikipedia.org/wiki/Robots.txt#Alternatives>
>       - 2.1 HTML meta tags for
> robots<http://en.wikipedia.org/wiki/Robots.txt#HTML_meta_tags_for_robots>
>       - 2.2 Directives within a
> page<http://en.wikipedia.org/wiki/Robots.txt#Directives_within_a_page>
>    - 3 References <http://en.wikipedia.org/wiki/Robots.txt#References>
>    - 4 See also <http://en.wikipedia.org/wiki/Robots.txt#See_also>
>    - 5 External
> links<http://en.wikipedia.org/wiki/Robots.txt#External_links>
>
>
> [edit<http://en.wikipedia.org/w/index.php?title=Robots_Exclusion_Standard&action=edit&section=1>
> ]
>
> Examples
>
> This example *allows all robots* to visit *all files* because the wildcard
> "*" specifies all robots.
>
> User-agent: *
> Disallow:
>
> This example keeps *all robots out*:
>
> User-agent: *
> Disallow: /
>
> The next is an example that tells *all crawlers* not to enter into four
> directories of a website:
>
> User-agent: *
> Disallow: /cgi-bin/
> Disallow: /images/
> Disallow: /tmp/
> Disallow: /private/
>
> Example that tells *a specific crawler* not to enter one specific directory:
>
> User-agent: BadBot
> Disallow: /private/
>
> Example demonstrating how comments can be used:
>
> # Comments appear after the "#" symbol at the start of a line, or
> after a directive
> User-agent: * # match all bots
> Disallow: / # keep them out
>
> [edit<http://en.wikipedia.org/w/index.php?title=Robots_Exclusion_Standard&action=edit&section=2>
> ]
>
> Compatibility
>
> In order to prevent access to all pages by robots,
>
> Disallow: *
>
> is not appropriate as this is not a stable standard extension. For example,
> despite the fact that Google claims support for this
> tag[1]<http://en.wikipedia.org/wiki/Robots.txt#_note-0>,
> it in fact does not[2] <http://en.wikipedia.org/wiki/Robots.txt#_note-1>.
>
> Instead:
>
> Disallow: /
>
> should be used.
> [edit<http://en.wikipedia.org/w/index.php?title=Robots_Exclusion_Standard&action=edit&section=3>
> ]
>
> Alternatives
>
> robots.txt is older and more widely accepted, but there are other methods
> (which can be used together with robots.txt) that allow greater control,
> like disabling indexing of images only or disabling archiving of page
> contents.
> [edit<http://en.wikipedia.org/w/index.php?title=Robots_Exclusion_Standard&action=edit&section=4>
> ]
>
> HTML meta tags for robots
>
> HTML <http://en.wikipedia.org/wiki/HTML> meta
> tags<http://en.wikipedia.org/wiki/Meta_tag>can be used to exclude
> robots according to the contents of web pages. Again,
> this is purely advisory, and also relies on the cooperation of the robot
> programs. For example,
>
> <meta name="robots" content="noindex,nofollow" />
>
> within the HEAD section of an HTML
> <http://en.wikipedia.org/wiki/HTML>document tells search engines such
> as
> Google <http://en.wikipedia.org/wiki/Google>,
> Yahoo!<http://en.wikipedia.org/wiki/Yahoo%21>,
> or MSN <http://en.wikipedia.org/wiki/MSN> to exclude the page from its index
> and not to follow any links on this page for further possible indexing.
>
> (See HTML Author's Guide to the Robots META
> tag<http://www.robotstxt.org/wc/meta-user.html>
> .)
> [edit<http://en.wikipedia.org/w/index.php?title=Robots_Exclusion_Standard&action=edit&section=5>
> ]
>
> Directives within a page
>
> The <NOINDEX> tag is a non-standard HTML tag whose intent is to indicate
> portions of a page that should not be indexed, such as common navigation or
> footer. Using it without a namespace will make
> XHTML<http://en.wikipedia.org/wiki/XHTML>pages invalid.
>
> Google uses comments for the same purpose: <!--googleoff: index--> ...
> <!--googleon: index-->
> [edit<http://en.wikipedia.org/w/index.php?title=Robots_Exclusion_Standard&action=edit&section=6>
> ]
>
> References
>
>    1. *^ <http://en.wikipedia.org/wiki/Robots.txt#_ref-0>*
>    http://www.google.com/webmasters/remove.html
>    2. *^ <http://en.wikipedia.org/wiki/Robots.txt#_ref-1>*
>    http://groups.google.com/groups?q=elvey+googlebot
>
> [edit<http://en.wikipedia.org/w/index.php?title=Robots_Exclusion_Standard&action=edit&section=7>
> ]
>
> See also
>
>    - Web crawler <http://en.wikipedia.org/wiki/Web_crawler>
>    - The nofollow
> attribute<http://en.wikipedia.org/wiki/Spam_in_blogs#rel.3Dnofollow>
>
> [edit<http://en.wikipedia.org/w/index.php?title=Robots_Exclusion_Standard&action=edit&section=8>
> ]
>
> External links
>
>    - The robots.txt of the US White
> House<http://www.whitehouse.gov/robots.txt>
>    - How to keep bad robots, spiders and web crawlers
> away<http://www.fleiner.com/bots/>
>    - robots.txt
> File<http://www.ilovejackdaniels.com/development/robots-txt-file/>(tutorial
> on how and why to add a
>    robots.txt file to websites, July 19, 2004)
>    - List of Bad Bots <http://www.kloth.net/internet/badbots.php>: A
>    short list of bad spiders and nasty bots seen on my different web sites
>    - HOWTO: Serving up default favicon.ico and robots.txt files with
>    Apache <http://laffey.tv/favicon_error_logs.html>
>    - Robots.txt
> Checker<http://tool.motoricerca.info/robots-checker.phtml>(validates
>    robots.txt files and gives optimization tips)
>    - A Standard for Robot
> Exclusion<http://www.robotstxt.org/wc/norobots.html>
>    - Robots Exclusion <http://www.robotstxt.org/wc/exclusion.html>
>    - Robots.txt Online
> Generator<http://www.yellowpipe.com/yis/tools/robots.txt/>
>    - RoboGen <http://www.rietta.com/robogen> - Shareware Windows program
>    for creating and editing robot exclusion files
>    - White House site has oddities, like Bush
> site<http://www.theinquirer.net/?article=19357>,
>    Nick Farrell (October 29, 2004)
>    - Comprehensive Robots.txt
> Tutorial<http://www.cre8asiteforums.com/forums/index.php?showtopic=8412&hl=>
>    - Using Robots.txt To Manage Search Engine
> Spiders<http://www.servergrade.com.au/faq/answers/robots-text.html>
>
>  Retrieved from "http://en.wikipedia.org/wiki/Robots_Exclusion_Standard"
>
> Category<http://en.wikipedia.org/w/index.php?title=Special:Categories&article=Robots_Exclusion_Standard>:
> World Wide Web <http://en.wikipedia.org/wiki/Category:World_Wide_Web>
>
>
> On 7/15/06, Morgan Gangwere <0.fractalus@gmail.com> wrote:
> >
> > On 7/15/06, Ricardo Kleemann <ricardo@americasnet.com> wrote:
> > > Thanks Max.
> > >
> > > > A first look shows that the script "bots.txt" currently available
> > targets
> > > > vulnerable installation of "Joomla" and "Mambo". There are some
> > > > vulnerabilities reported for the included phpBB and an extension
> > called
> > > > perForms.
> > >
> > > But how in the first place, is apache even downloading the bots.txt, and
> > > then, running it? Is it running in-memory, since it's not anywhere in
> > the
> > > filesystem ?
> > >
> > > And what commands can be run on port 80 to do the download/run of the
> > > script?
> > >
> > > >
> > > > The bot seems to join a specific IRC-chan waiting for commands and
> > looking
> > > > for new vulnerable installations via google-searches.
> > > >
> > > > Perhaps you want to replace any wget-binaries with a shell script
> > logging
> > > > environment and command-line switches to identify the document used to
> > > > retrieve the script.
> > > >
> > > >>  PLEASE HELP...
> > > >>
> > > >
> > > > You should stop your Apache! :D
> > > >
> > > > .max
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > The official User-To-User support forum of the Apache HTTP Server
> > Project.
> > > > See <URL:http://httpd.apache.org/userslist.html> for more info.
> > > > To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
> > > >   "   from the digest: users-digest-unsubscribe@httpd.apache.org
> > > > For additional commands, e-mail: users-help@httpd.apache.org
> > > >
> > > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > The official User-To-User support forum of the Apache HTTP Server
> > Project.
> > > See <URL:http://httpd.apache.org/userslist.html> for more info.
> > > To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
> > >    "   from the digest: users-digest-unsubscribe@httpd.apache.org
> > > For additional commands, e-mail: users-help@httpd.apache.org
> > >
> > >
> >
> > does ANYBODY even know what bots.txt even DOES?
> >
> > bots.txt should look like this:
> >
> > accept all
> > reject altaVista
> >
> > look at virussin.com/bots.txt to see what it SHOULD do... its for
> > SEARCH EINGINES. the bot grabs it, looks at it, and it its on the
> > white list of eingines, it caches the site, if its on the blacklist
> > (reject), it sulks away into a corner...
> >
> > M-g
> >
> > --
> > "Space does not reflect society, it expresses it." -- Castells, M.,
> > Space of Flows, Space of Places: Materials for a Theory of Urbanism in
> > the Information Age, in The Cybercities Reader, S. Graham, Editor.
> > 2004, Routledge: London. p. 82-93.
> >
> > ---------------------------------------------------------------------
> > The official User-To-User support forum of the Apache HTTP Server Project.
> > See <URL:http://httpd.apache.org/userslist.html> for more info.
> > To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
> >    "   from the digest: users-digest-unsubscribe@httpd.apache.org
> > For additional commands, e-mail: users-help@httpd.apache.org
> >
> >
>
>


-- 
"Space does not reflect society, it expresses it." -- Castells, M.,
Space of Flows, Space of Places: Materials for a Theory of Urbanism in
the Information Age, in The Cybercities Reader, S. Graham, Editor.
2004, Routledge: London. p. 82-93.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message