forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicola Ken Barozzi <nicola...@apache.org>
Subject Re: Link Crawling?
Date Mon, 04 Nov 2002 16:27:46 GMT

Vadim Gritsenko wrote:
> Nicola Ken Barozzi wrote:
> 
>>
>> Vadim Gritsenko wrote:
>>
>>> Nicola Ken Barozzi wrote:
>>>
>>>>
>>>> Peter Donald wrote:
>>>> > Hi,
>>>> >
>>>> > Is there anyway I can add more strategies for link crawling during 
>>>> CLI
>>>> > operation? In particular I have a css sheet that has
>>>> >
>>>> > @import url("blah.css");
>>>> >
>>>> > but this wont ever be copied across because it is not crawled.
>>>> >
>>>> > Suggestions?
>>>>
>>>> Basically, the whole Cocoon CLI system has been hacked away by Stefano
>>>> and also Gianugo, and not much touched since then.
>>>>
>>>> It has been neglected for long, and as you know too well from the use
>>>> you made on Avalon site, it stopped at every single problem with links
>>>> it had, which BTW has never been the intention of the original writers.
>>>>
>>>> Lately I have tweaked it to output better info to the user and not to
>>>> break on broken links.
>>>> It still needs more work though.
>>>>
>>>> For now you have two options: include that link in the html as an
>>>> attribute to a tag (try <!-- <a href="blah.css"/> --> ) or patch
the
>>>> Cocoon CLI which is Main.java and many other classes.
>>>
>>>
>>> Actually, whole link extraction logic is in LinkSerializer and its 
>>> parents.
>>
>> Actually there is some link processing in Main.java, look at
>>
>>   public Collection processURI(String uri) throws Exception {...
>>
>> to see what I mean. 
> 
> Sorry, Ken, I don't see what you see: this method just collects & 
> translates URIs returned by LinkSamplingEnv (== LinkSerializer) and 
> tries to come up with the file name for it...

I think I'm just not thinking the same thing you are.
I am talking about the while link-collecting+translating+whatever stuff.

Getting a name for it is important as getting it in the LinkSerializer.
As you see froim the code below it works with the names of the files.

         String filename = (String)allTranslatedLinks.get(suri);
         if (filename == null) {
             filename = mangle(suri);
             final String type = getType(deparameterizedURI, parameters);
             final String ext = NetUtils.getExtension(filename);
             final String defaultExt=MIMEUtils.getDefaultExtension(type);
             if ((ext == null) || (!ext.equals(defaultExt))) {
                 filename += defaultExt;
             }
             allTranslatedLinks.put(suri, filename);
         }


I know you know it, wanted just to say that not all the link 
processing+translation is done in the LinkSerializer, but also in other 
places, and that the processing is started and managed by Main.java.

Phewww, I hope this finally got through! ;-)

> PS I've spent some time on this method, trust me
> http://cvs.apache.org/viewcvs.cgi/xml-cocoon2/src/org/apache/cocoon/Attic/Main.java?rev=1.5&content-type=text/vnd.viewcvs-markup

> 
> :)

Cool :-)

<me-too>
http://cvs.apache.org/viewcvs.cgi/xml-cocoon2/src/java/org/apache/cocoon/Main.java
</me-too>

Thanks anyway for punctualizing, you know how messy I am sometimes in 
mails... when you'll stop, *then* I'll know there's a problem ;-)

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------


Mime
View raw message