cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Giacomo Pati <Giacomo.P...@pwr.ch>
Subject Re: [C2] Link filtering and Content aggregation
Date Sat, 07 Oct 2000 08:21:49 GMT
Stefano Mazzocchi wrote:
> 
> Giacomo Pati wrote:
> 
> > > more or less.... but it's pretty easy to have an implicit "robot.txt"
> > > resource directly created by Cocoon even if the file is not present
> > > based on sitemap parameters.
> >
> > Yes, but (after reading it) the robot.txt spec says that there is only
> > one robot.txt and the request URI is "/robot.txt" for the hole site (and
> > not as a sub context like "/cocoon/robot.txt".
> 
> Ah, ok. But we can at least have it generated by CLI from the sitemap,
> don't you think?

Sure, it seems to be simple. But you have to give me some day until I'm
ready to go on this topic. I first want to finish the pooling and
component inheritance code I've begun.

> 
> (BTW, is there a sort of)
> 
> > > > Shouldn't we express the crawl attribute to the outside by a request
> > > > URI to "robot.txt"?
> > >
> > > exactly
> >
> > I must disagree after reading the robot.txt spec. It's not possible for
> > cocoon.
> 
> ok
> 
> > > > Or is crawling from the commandline and crawling by
> > > > a spider different?
> > >
> > > good point, didn't think of that. what do you think?
> >
> > Using /robot.txt means writing the robot.txt by hand, deploying it into
> > the root context and not specifying it in the sitemap. If we can't
> > exactly simulate a command line environment (like the http environment)
> > we need to distinguish between them because in fact there is no
> > differnce between a spider and a browser.
> 
> right

I still don't know what we should decide to go for.

- Cocoon2 has a built in .../robot.txt on the root sitemap    
or 
- Cocoon2 sends a status code on a URI which has a crawl=no 
  if the environment.isCrawling() (or whatever methodname we choose)
  is true.

Giacomo

> 
> --
> Stefano Mazzocchi      One must still have chaos in oneself to be
>                           able to give birth to a dancing star.
> <stefano@apache.org>                             Friedrich Nietzsche
> --------------------------------------------------------------------
>  Missed us in Orlando? Make it up with ApacheCON Europe in London!
> ------------------------- http://ApacheCon.Com ---------------------

-- 
PWR GmbH, Organisation & Entwicklung      Tel:   +41 (0)1  856 2202
Giacomo Pati, CTO/CEO                     Fax:   +41 (0)1  856 2201
Hintereichenstrasse 7                     Mobil: +41 (0)78 759 7703
CH-8166 Niederweningen                    Mailto:Giacomo.Pati@pwr.ch
                                          Web:   http://www.pwr.ch

Mime
View raw message