cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: [C2] Link filtering and Content aggregation
Date Thu, 05 Oct 2000 10:31:09 GMT
Giacomo Pati wrote:
> 
> --- Ross Burton <ross.burton@mail.com> wrote:
> > > If you only used stylebook, I know you love it and don't see any
> > > problems with it: it's a magic tool that does the job for you (more
> > or
> > > less an autodoc)... but if you ever tried to write a skin for it...
> > > well, you know what I mean when I say there are problems.
> >
> > Do _not_ remind me of the time I wrote a new skin for Stylebook!
> >
> >
> > > link filtering
> > > --------------
> > >
> > > IMO, we need to expand the sitemap semantics to allow resources to
> > be
> > > blocked from CLI crawling. The best way, IMO, is to add a specific
> > > attribute to the resource indicating elements... these elements are
> > >
> > >  - match
> > >  - mount
> 
> There is the "select" as well because someone can write a uri-selector
> based on the selector interface (if you want to apply the crawl
> attribute deep down the pipeline tree). We have dicided that a pipeline
> element only has "match" elements as direct children, so that we could
> say a crawl attribute can be applyed to those immediately "match"
> elements only.

Hmmmm, I think that crawling should be applied to "match" only, even if
the matcher is not using URI to resolve the match. Selection happens
only after the matching has taken place so this covers them all. Don't
you think?
 
> > > and we just have to define an attribute name between
> > >
> > >  - crawl
> > >  - crawlable
> > >  - walk
> > >  - walkable
> > >  - ???
> 
> Has this something to do with the known "robot.txt" file used to
> prevent spiders from stepping into specific URIs?

more or less.... but it's pretty easy to have an implicit "robot.txt"
resource directly created by Cocoon even if the file is not present
based on sitemap parameters.
 
> Shouldn't we express the crawl attribute to the outside by a request
> URI to "robot.txt"?

exactly

> Or is crawling from the commandline and crawling by
> a spider different?

good point, didn't think of that. what do you think?

> The sitemap can check that uri if it fails to
> select a resource in a pipeline (falling through all matches).

right.

> > >
> > > for example
> > >
> > >  <map:match patter="someuri" crawl="no">
> > >   ..
> > >  </map:match>
> > >
> > > will return a specific error number to the CLI requesting the page.
> 
> Anybody in touch with those error numbers used? Are there any free to
> use to implement custom needs?
> 
> > >
> > > What do you think?
> >
> > The sitemap needs this sort of flexibility, there could be a section
> > of
> > the URI space which could potentially return gigabytes of files (for
> > an
> > example, see rpmfind.net).  I'm +1 on... crawl="yes|no".
> 
> I suggest that the Environment interface needs to be expanded for that
> to make the sitemap engine able to query if a crawl is taking place (if
> we don't choose the "robot.txt" mentioned above). I still don't want
> the sitemap engine to deal with the Request/Response/Context objects
> for that (until someone convice me with a good reason). All the sitemap
> engine needs must be expressed by the Environment object passed in
> (even if it may duplicate information available in the
> Request/Response/Context objects).

I agree. +1
 
> > > Content Aggregation
> > > -------------------
> > >
> >
> > > It was already proposed to use the "cocoon:" protocol and to access
> > them
> >
> > And I'm a big +20 on this.
> >
> > > so
> > >
> > >  <sitebar xinclude:href="cocoon:/sitebar"/>
> > >
> > > is expanded at runtime as
> > >
> > >  <sitebar>
> > >   <item xlink:href=".."/>
> > >   <item xlink:href="index"/>
> > >   <item xlink:href="user-guide"/>
> > >  </sitebar>
> >
> > I take it that in this example the resource /sitebar returns the XML:
> >
> >   <sitebar>
> >     <item xlink:href=".."/>
> >     ....
> >   </sitebar>
> 
> Are you sure this should return the XML? Is this an implicit
> "cocoon-view=first" parameter?

no, no, this is not an XInclude, but an XLink, it will simply be
transformed to <a href=""> and passed to the client, no aggregation
takes place here.
 
> > And _replaces_ the original <sitebar> element.  The same behaviour
> > would
> > be the same if the original element was, for example: <foo
> > xinclude:href="cocoon:/sitebar"/>, right? I'd feel safer using just
> > <xinclude:include href="cocoon://sitebar"/>, as I think the syntax is
> > clearer.
> 
> This calls for the XIncludeTransformer and it seems clearer to me too.
> Is this where "content aggregation" take place for an example? And
> where else?

I think my RT answered this. If not, say so.
 
> >
> > Oh, IIRC the URI RFC states that the format is protocol://host/path,
> > so
> > the resource should be cocoon://sitebar or cocoon:///sitebar
> 
> True!

false! :)

cocoon://sitebar is wrong (as Peter correctly stated)... but I'm sure
*many* will get it wrong so it's not big deal to ingnore the purity of
the URI spec and allow this to work as well. I already picture "tons" of
user emails about this :// not working. :(
 
> > depending
> > on the sitemap.
> >
> > This requires a custom URL handler, doesn't it?  How is this going to
> > be
> > handled?  org.apache.cocoon.utils.URL?
> 
> I don't know if this is possible. Does such a custom URL handler have
> all the information necessary to fulfill that need? Wouldn't it be
> better the sitemap engine itself checks this and somehow recursively
> calls itself?

Totally. +1000 to this until we have a better URL handling package...
and it will take a while given current Avalon status and my time :(
 
> >
> > Ross Burton
> 
> Giacomo
> 
> =====
> --
> PWR GmbH, Organisation & Entwicklung      Tel:   +41 (0)1 856 2202
> Giacomo Pati, CTO/CEO                     Fax:   +41 (0)1 856 2201
> Hintereichenstrasse 7                     Mailto:Giacomo.Pati@pwr.ch
> CH-8166 Niederweningen                    Web:   http://www.pwr.ch
> 
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Photos - 35mm Quality Prints, Now Get 15 Free!
> http://photos.yahoo.com/


-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<stefano@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------



Mime
View raw message