cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Gritsenko <>
Subject Re: [RT] New Cocoon Site Crawler Environment
Date Wed, 18 Dec 2002 03:02:47 GMT
Nicola Ken Barozzi wrote:

> Vadim Gritsenko wrote:
>> Nicola Ken Barozzi wrote:
>> ...
>>> Why is it so slow?
>>> Mostly because it generates each source three times.


>>> Thus after the call we would have in the environment the result, the 
>>> type and the links, all in one call.
>> Type and links - yes, I agree. Content - no, we won't get correct 
>> content because links will not be translated in this content. And 
>> produced content is impossible to "re-link" because it can be any 
>> binary format supporting links (MS Excel, PDF, MS Word, ...)
> Ok, you are correct.
> Please add here the results we have come to in our fast AIM 
> discussion, I have to run now.

Ok, here is the thing. It is possible to get everything in one call (and 
- this remark goes to Berin - without increase in resource consumption), 
if we (re)move translateURI functionality from the Main. Problem is that 
this getType() method is used only for one purpose - to decide on a good 
name for the resulting file, to decide on a good extension according to 
the MIMEUtils settings. And another problem is that this getLinks is 
used only to collect this information (about good names) and deliver it 
to the LinkTranslator transformer, which does actual work of replacing 

So, if we remove link translation from the, where it can go 
and how it should be done? There are several options.

1) Do not change names.
This works for everything except URIs ending with "/" - and for such 
URIs, we can use existing solution - add Constants.INDEX_URI to the end.
Points in favor of this method:
    * generated site will be close to the live site with regards to file 
    * in Main java, there is need in only one call.
2) Change names according to the translation table supplied to the Main 
by the user.
This solution provides some flexibility (may be too much of it).
Points in favor of this method:
    * Flexibility.
    * Same as above.
3) Change names as we done that before - by utilizing MIMEUtils.
Points in favor of this method:
    * This is backward compatible way.
    * We still have to know types of all links to do translation. Which 
means, extra getType() call on every link (excluding duplicates - 
information is cached). Hm, this one, actually, is not in favor...

And this name translation can happen in LinkTranslator transformer which 
currently does link translation magic. If we move all URI translation 
logic, whatever it will be (see points 1-3 above), it will be possible 
to implement Main in one step instead of three steps.

Exclusion being the case (3), where complexity will be added to 
LinkTranslator, but still, we will reduce calls from 3 (per link) to 2.

> Thanks :-) 

You are welcome. Hope I tell story quite understandable.

>> But, there is hope to get all in once - if LinkSamplingTransformer 
>> will also be LinkTranslatingTransformer and will call Main back on 
>> every new link (recursive processing - as opposed to iterative 
>> processing in current implementation of the Main). The drawback of 
>> recursion approach is increased memory consumption.
> NAO = not an option

Yes, it was totally wrong idea from my side.

> It doesn't scale, you are right.

And it never did. Amen.


To unsubscribe, e-mail:
For additional commands, email:

View raw message