Return-Path: Delivered-To: apmail-xml-forrest-dev-archive@xml.apache.org Received: (qmail 34452 invoked by uid 500); 4 Nov 2002 16:29:26 -0000 Mailing-List: contact forrest-dev-help@xml.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: Reply-To: forrest-dev@xml.apache.org Delivered-To: mailing list forrest-dev@xml.apache.org Received: (qmail 34336 invoked from network); 4 Nov 2002 16:28:57 -0000 Received: from fep01.tuttopmi.it (HELO fep01-svc.flexmail.it) (212.131.248.100) by daedalus.apache.org with SMTP; 4 Nov 2002 16:28:57 -0000 Received: from apache.org ([80.204.154.181]) by fep01-svc.flexmail.it (InterMail vM.5.01.05.09 201-253-122-126-109-20020611) with ESMTP id <20021104162855.IFCZ27147.fep01-svc.flexmail.it@apache.org> for ; Mon, 4 Nov 2002 17:28:55 +0100 Message-ID: <3DC6A002.3070205@apache.org> Date: Mon, 04 Nov 2002 17:27:46 +0100 From: Nicola Ken Barozzi Reply-To: nicolaken@apache.org Organization: Apache Software Foundation User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.1) Gecko/20020826 X-Accept-Language: en-us, en MIME-Version: 1.0 To: forrest-dev@xml.apache.org Subject: Re: Link Crawling? References: <3DC62E01.7070908@apache.org> <3DC69237.3060407@verizon.net> <3DC693EB.1060607@apache.org> <3DC69CD9.5050008@verizon.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Vadim Gritsenko wrote: > Nicola Ken Barozzi wrote: > >> >> Vadim Gritsenko wrote: >> >>> Nicola Ken Barozzi wrote: >>> >>>> >>>> Peter Donald wrote: >>>> > Hi, >>>> > >>>> > Is there anyway I can add more strategies for link crawling during >>>> CLI >>>> > operation? In particular I have a css sheet that has >>>> > >>>> > @import url("blah.css"); >>>> > >>>> > but this wont ever be copied across because it is not crawled. >>>> > >>>> > Suggestions? >>>> >>>> Basically, the whole Cocoon CLI system has been hacked away by Stefano >>>> and also Gianugo, and not much touched since then. >>>> >>>> It has been neglected for long, and as you know too well from the use >>>> you made on Avalon site, it stopped at every single problem with links >>>> it had, which BTW has never been the intention of the original writers. >>>> >>>> Lately I have tweaked it to output better info to the user and not to >>>> break on broken links. >>>> It still needs more work though. >>>> >>>> For now you have two options: include that link in the html as an >>>> attribute to a tag (try ) or patch the >>>> Cocoon CLI which is Main.java and many other classes. >>> >>> >>> Actually, whole link extraction logic is in LinkSerializer and its >>> parents. >> >> Actually there is some link processing in Main.java, look at >> >> public Collection processURI(String uri) throws Exception {... >> >> to see what I mean. > > Sorry, Ken, I don't see what you see: this method just collects & > translates URIs returned by LinkSamplingEnv (== LinkSerializer) and > tries to come up with the file name for it... I think I'm just not thinking the same thing you are. I am talking about the while link-collecting+translating+whatever stuff. Getting a name for it is important as getting it in the LinkSerializer. As you see froim the code below it works with the names of the files. String filename = (String)allTranslatedLinks.get(suri); if (filename == null) { filename = mangle(suri); final String type = getType(deparameterizedURI, parameters); final String ext = NetUtils.getExtension(filename); final String defaultExt=MIMEUtils.getDefaultExtension(type); if ((ext == null) || (!ext.equals(defaultExt))) { filename += defaultExt; } allTranslatedLinks.put(suri, filename); } I know you know it, wanted just to say that not all the link processing+translation is done in the LinkSerializer, but also in other places, and that the processing is started and managed by Main.java. Phewww, I hope this finally got through! ;-) > PS I've spent some time on this method, trust me > http://cvs.apache.org/viewcvs.cgi/xml-cocoon2/src/org/apache/cocoon/Attic/Main.java?rev=1.5&content-type=text/vnd.viewcvs-markup > > :) Cool :-) http://cvs.apache.org/viewcvs.cgi/xml-cocoon2/src/java/org/apache/cocoon/Main.java Thanks anyway for punctualizing, you know how messy I am sometimes in mails... when you'll stop, *then* I'll know there's a problem ;-) -- Nicola Ken Barozzi nicolaken@apache.org - verba volant, scripta manent - (discussions get forgotten, just code remains) ---------------------------------------------------------------------