forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicola Ken Barozzi <nicola...@apache.org>
Subject Re: cocoon crawler, wget, the problem of extracting links
Date Fri, 13 Dec 2002 16:05:45 GMT


Steven Noels wrote:
> Bruno Dumon wrote:
> 
>> Another solution would be to make a list of URL's for all these files
>> and feed that to the crawler. The thing that makes this list would of
>> course need to have some assumptions on how files on the filesystem or
>> mapped in the URL space.
> 
> Or vice-versa.
> 
> I'm still stuck with this idea of having a LinkResolverTranformer which, 
> given a configuration of schemes and their respective source resolution, 
> would rewrite links as needed. It might be "boneheaded me", and 
> orthogonal/supplementary to the sitemap and what is currently put 
> forward, but I want to do my thinking in public.

[...]

> Does this make sense at all?

Yes, it does.

It's exactly the same concept in my "Concern 1" section about link 
lookup and resolving. I modeled it as an action, but forgot to add the 
transformation of links too, that you explain here.

+1 (about the concept, we will see what makes more sense 
implementation-wise)

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------


Mime
View raw message