cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Giacomo Pati <pati_giac...@yahoo.com>
Subject Re: [C2] Link filtering and Content aggregation
Date Tue, 03 Oct 2000 11:19:14 GMT

--- Ross Burton <ross.burton@mail.com> wrote:
> > If you only used stylebook, I know you love it and don't see any
> > problems with it: it's a magic tool that does the job for you (more
> or
> > less an autodoc)... but if you ever tried to write a skin for it...
> > well, you know what I mean when I say there are problems.
> 
> Do _not_ remind me of the time I wrote a new skin for Stylebook!
> 
> 
> > link filtering
> > --------------
> > 
> > IMO, we need to expand the sitemap semantics to allow resources to
> be
> > blocked from CLI crawling. The best way, IMO, is to add a specific
> > attribute to the resource indicating elements... these elements are
> > 
> >  - match
> >  - mount

There is the "select" as well because someone can write a uri-selector
based on the selector interface (if you want to apply the crawl
attribute deep down the pipeline tree). We have dicided that a pipeline
element only has "match" elements as direct children, so that we could
say a crawl attribute can be applyed to those immediately "match"
elements only.

> > and we just have to define an attribute name between
> > 
> >  - crawl
> >  - crawlable
> >  - walk
> >  - walkable
> >  - ???

Has this something to do with the known "robot.txt" file used to
prevent spiders from stepping into specific URIs?

Shouldn't we express the crawl attribute to the outside by a request
URI to "robot.txt"? Or is crawling from the commandline and crawling by
a spider different? The sitemap can check that uri if it fails to
select a resource in a pipeline (falling through all matches). 

> > 
> > for example
> > 
> >  <map:match patter="someuri" crawl="no">
> >   ..
> >  </map:match>
> > 
> > will return a specific error number to the CLI requesting the page.

Anybody in touch with those error numbers used? Are there any free to
use to implement custom needs?

> > 
> > What do you think?
> 
> The sitemap needs this sort of flexibility, there could be a section
> of
> the URI space which could potentially return gigabytes of files (for
> an
> example, see rpmfind.net).  I'm +1 on... crawl="yes|no".

I suggest that the Environment interface needs to be expanded for that
to make the sitemap engine able to query if a crawl is taking place (if
we don't choose the "robot.txt" mentioned above). I still don't want
the sitemap engine to deal with the Request/Response/Context objects
for that (until someone convice me with a good reason). All the sitemap
engine needs must be expressed by the Environment object passed in
(even if it may duplicate information available in the
Request/Response/Context objects).

> > Content Aggregation
> > -------------------
> > 
> 
> > It was already proposed to use the "cocoon:" protocol and to access
> them
> 
> And I'm a big +20 on this.
> 
> > so 
> > 
> >  <sitebar xinclude:href="cocoon:/sitebar"/>
> > 
> > is expanded at runtime as
> > 
> >  <sitebar>
> >   <item xlink:href=".."/>
> >   <item xlink:href="index"/>
> >   <item xlink:href="user-guide"/>
> >  </sitebar>
> 
> I take it that in this example the resource /sitebar returns the XML:
> 
>   <sitebar>
>     <item xlink:href=".."/>
>     ....
>   </sitebar>

Are you sure this should return the XML? Is this an implicit
"cocoon-view=first" parameter?

> 
> And _replaces_ the original <sitebar> element.  The same behaviour
> would
> be the same if the original element was, for example: <foo
> xinclude:href="cocoon:/sitebar"/>, right? I'd feel safer using just
> <xinclude:include href="cocoon://sitebar"/>, as I think the syntax is
> clearer.

This calls for the XIncludeTransformer and it seems clearer to me too.
Is this where "content aggregation" take place for an example? And
where else?

> 
> Oh, IIRC the URI RFC states that the format is protocol://host/path,
> so
> the resource should be cocoon://sitebar or cocoon:///sitebar

True!

> depending
> on the sitemap.
> 
> This requires a custom URL handler, doesn't it?  How is this going to
> be
> handled?  org.apache.cocoon.utils.URL?

I don't know if this is possible. Does such a custom URL handler have
all the information necessary to fulfill that need? Wouldn't it be
better the sitemap engine itself checks this and somehow recursively
calls itself?

> 
> Ross Burton

Giacomo

=====
--
PWR GmbH, Organisation & Entwicklung      Tel:   +41 (0)1 856 2202
Giacomo Pati, CTO/CEO                     Fax:   +41 (0)1 856 2201
Hintereichenstrasse 7                     Mailto:Giacomo.Pati@pwr.ch
CH-8166 Niederweningen                    Web:   http://www.pwr.ch

__________________________________________________
Do You Yahoo!?
Yahoo! Photos - 35mm Quality Prints, Now Get 15 Free!
http://photos.yahoo.com/

Mime
View raw message