cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Conal Tuohy" <>
Subject How to "burst" a large document into small pieces?
Date Mon, 15 Dec 2003 02:50:02 GMT
I am looking for a way to improve performance "bursting" or splitting large documents into
smaller ones. 

Currently I have a pipeline that takes a large document and uses an XSLT to extract one part
of it (i.e. a chapter). The sitemap takes the chapter id from the request URI and passes it
to the XSLT. This is quite slow for large documents. For instance, consider a 5Mb document
with 50 chapters or sections each of 100kb. If I crawl my website and access each chapter,
then the pipeline will read this 5Mb document 50 times and extract each chapter. So 250Mb
of data passes through the pipeline, and a total of 5Mb returned (i.e. 50 x 100kb).

I'm wondering if I can improve performance by splitting the large document up to write chapters
of it into separate files. This way I would need to traverse the document only once. For instance,
my 5Mb document would be split (once) into 50 files, and each file would be returned individually.
So 10Mb of data passes through the pipeline (i.e. 5Mb while splitting the document, plus 50
x 100kb returned to the browser).

I think the SourceWritingTransformer might be used to split the documents, but I would also
need to be able to check the last-modified dates of the original file and the split files
so that the document could be re-split whenever it is edited.

Alternatively, the FragmentExtractorTransformer might do it. I can't find much documentation
for this component though - and I don't really know how to use it.

Can anyone advise me about either of these approaches, or suggest any other ideas?



Conal Tuohy
Senior Programmer
New Zealand Electronic Text Centre

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message