cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicola Ken Barozzi <nicola...@apache.org>
Subject Re: [RT] New Cocoon Site Crawler Environment
Date Wed, 18 Dec 2002 07:34:29 GMT

Bernhard Huber wrote:
> Hi,

Hi :-)

> <big snip/>
> 
> ask Cocoon two things, make a Generator/Transformer to do the two thinks,
> 
> I now play around with a SourceLinkStatusGenerator, which is like
> StatusGenerator but does not request the links of a page via http: call,
> but via processor.process() call, it does it recursivly, does you ask
> SourceLinkStatusGenerator give me all links outbounded links of 
> index.html, and it will return an xml document with all links of the 
> pages reachable from index.html.
> 
> You ask Cocoon give me the content of page index.html plus its out 
> bounding links.
> 
> The only problem I see you will get not text/html if you ask Cocoon this
> question but text/html+application/x-cocoon-links response - taking the 
> index.html example of above.
> 
> Moreover you might have to adopt the sitemap to let's
> <map:match pattern="crawling"> and asking within this map:match
> cocoon the right question?

Actually I'd ask the question to the Environment, because link harvsting 
has to be plugged in the pipelines or the views in a non-intrusive manner.

> Hmm, if you rely on links, you might want LinkTransformer, not to throw 
> away the page content, but to harvest the links content-no-destructive.

Yes.

> Hmm, that would be the best no big sitemap changes, just another
> transforming step, instead of type="xslt" src="linkstatus.xslt"
> the new LinkAndContentTransformer step, but the content-type issue stays.

We could do away with it, and get the file as-is.

> btw, thxs for starting this RT, i don't have the passion to initiate 
> this, but it is neccessary, and i appreciate it.

Wasn't it you who did the Ant stuff? Where do you think I got inspiration?

Thank *you* :-)

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message