www-infrastructure-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Kulp (JIRA)" <j...@apache.org>
Subject [jira] Reopened: (INFRA-1343) setup robots.txt and/or other access rules to prevent bots from crawling Continuum pages
Date Mon, 14 Dec 2009 14:35:18 GMT

     [ https://issues.apache.org/jira/browse/INFRA-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Kulp reopened INFRA-1343:
--------------------------------



I'm going to reopen this as the current solution is extremely problematic.

It only has:
Disallow: /confluence/


That means the "static" content for all the spaces is indexable by the crawlers.   For sites
that are copying the content to their project spaces, that means it's getting indexed at both
cwiki and in the "real" spots.    In many cases, the cwiki pages are showing up in search
results at google instead of the real pages. 

Basically, we need a way for each space to "opt out" of being indexed on cwiki.   

For the short term, can we add:

Disallow: /CXF/
Disallow: /CXF20DOC/
Disallow: /ACTIVEMQ/
Disallow: /CAMEL/
Disallow: /SM/
Disallow: /SMX3/
Disallow: /SMX4/
Disallow: /SMX4KNL/
Disallow: /SMX4NMR/
Disallow: /SMX4RUN/
Disallow: /SMXCOMP/
Disallow: /TUSCANY/


Probably a bunch of others as well.    I almost want to suggest default is disallowed with
an "Opt In" per space, just not sure how to accomplish that.




> setup robots.txt and/or other access rules to prevent bots from crawling Continuum pages

> -----------------------------------------------------------------------------------------
>
>                 Key: INFRA-1343
>                 URL: https://issues.apache.org/jira/browse/INFRA-1343
>             Project: Infrastructure
>          Issue Type: Task
>      Security Level: public(Regular issues) 
>          Components: Continuum
>            Reporter: Brett Porter
>
> We don't need search engines crawling the build pages (especially since it can navigate
its way all the way through a working copy). It is picking up links from the mails sent out
to mailing lists, presumably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message