forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicola Ken Barozzi <>
Subject Re: Link Crawling?
Date Mon, 04 Nov 2002 16:27:46 GMT

Vadim Gritsenko wrote:
> Nicola Ken Barozzi wrote:
>> Vadim Gritsenko wrote:
>>> Nicola Ken Barozzi wrote:
>>>> Peter Donald wrote:
>>>> > Hi,
>>>> >
>>>> > Is there anyway I can add more strategies for link crawling during 
>>>> CLI
>>>> > operation? In particular I have a css sheet that has
>>>> >
>>>> > @import url("blah.css");
>>>> >
>>>> > but this wont ever be copied across because it is not crawled.
>>>> >
>>>> > Suggestions?
>>>> Basically, the whole Cocoon CLI system has been hacked away by Stefano
>>>> and also Gianugo, and not much touched since then.
>>>> It has been neglected for long, and as you know too well from the use
>>>> you made on Avalon site, it stopped at every single problem with links
>>>> it had, which BTW has never been the intention of the original writers.
>>>> Lately I have tweaked it to output better info to the user and not to
>>>> break on broken links.
>>>> It still needs more work though.
>>>> For now you have two options: include that link in the html as an
>>>> attribute to a tag (try <!-- <a href="blah.css"/> --> ) or patch
>>>> Cocoon CLI which is and many other classes.
>>> Actually, whole link extraction logic is in LinkSerializer and its 
>>> parents.
>> Actually there is some link processing in, look at
>>   public Collection processURI(String uri) throws Exception {...
>> to see what I mean. 
> Sorry, Ken, I don't see what you see: this method just collects & 
> translates URIs returned by LinkSamplingEnv (== LinkSerializer) and 
> tries to come up with the file name for it...

I think I'm just not thinking the same thing you are.
I am talking about the while link-collecting+translating+whatever stuff.

Getting a name for it is important as getting it in the LinkSerializer.
As you see froim the code below it works with the names of the files.

         String filename = (String)allTranslatedLinks.get(suri);
         if (filename == null) {
             filename = mangle(suri);
             final String type = getType(deparameterizedURI, parameters);
             final String ext = NetUtils.getExtension(filename);
             final String defaultExt=MIMEUtils.getDefaultExtension(type);
             if ((ext == null) || (!ext.equals(defaultExt))) {
                 filename += defaultExt;
             allTranslatedLinks.put(suri, filename);

I know you know it, wanted just to say that not all the link 
processing+translation is done in the LinkSerializer, but also in other 
places, and that the processing is started and managed by

Phewww, I hope this finally got through! ;-)

> PS I've spent some time on this method, trust me

> :)

Cool :-)


Thanks anyway for punctualizing, you know how messy I am sometimes in 
mails... when you'll stop, *then* I'll know there's a problem ;-)

Nicola Ken Barozzi         
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)

View raw message