lenya-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Hannigan <tim.hanni...@queensu.ca>
Subject Re: Caching Aggregation of Directories
Date Mon, 21 Aug 2006 14:32:28 GMT
Thanks a million, I'll get on trying this out today.


On 16-Aug-06, at 7:52 PM, solprovider@apache.org wrote:

> On 8/16/06, Tim Hannigan <tim.hannigan@queensu.ca> wrote:
>> Solprovider - I've managed to use the aggregate files piece
>> (http://solprovider.com/lenya/aggregatefiles) that you
>> wrote months back and it's great.
> Thanks.  It is nice to have one's work appreciated.
> [Summary: We tag documents and dynamically produce a report of
> documents with specific tags.  Producing the report is
> processing-intensive and we would like to use caching to improve
> performance. Lenya-1.2.4.]
>> I'm looking for a way to cache the dump pipeline (so that we don't  
>> have to
>> aggregate the entire site each time a newsAggregator doctype is  
>> accessed).
>> I've looked into a few options and I'm looking for some advice on  
>> which to
>> go forward with.
>> One technique would be to somehow set a timed cache on just that  
>> pipeline;
>> however it looks like expires cache (as per the Cocoon docs) is only
>> available as of Cocoon 2.1.9 and my IT dept is set on using Cocoon  
>> 2.1.7 for
>> now (also I'm not even sure that 2.1.9 is compatible with Lenya?).
>> A second technique I've considered would be to use the File  
>> Generator (which
>> uses cache very nicely) to bring in an xml site dump as a file;  
>> this file
>> would have to be generated outside of this pipeline, and would be a
>> precondition for the newsAggregator pipeline executing.
>> This leads me to consider 2 sub-options:
>> i) run a scheduled process that would call the $pubname/ 
>> siteAggregator.xml
>> url, then take the xml output and write it to a file in the  
>> publication's
>> work directory.
>> I'm not entirely sure how Cocoon's scheduler works, but I suppose  
>> I could
>> have a shell script on a cron job that's doing a CURL. I'd love to  
>> do it
>> internally in Cocoon if I could.
>> ii) somehow leverage Lucene's site dump and use that instead.
>> I haven't used Lucene yet, so I'm not really sure how to use it in  
>> this
>> context. Am I correct to assume that Lucene has a cron job that  
>> generates a
>> dump on a prescribed timeline?
> This breaks into three functions:
> 1. Cache the results.
> 2. Use the cache if it exists.
> 3. Delete the cache on a schedule.
> The first two functions are built into publication-sitemap.xml in
> Lenya-1.2.  It was disabled in 1.2.4 by adding "disabled" to the match
> of the pipeline.
> Another examples is at:
> http://solprovider.com/lenya/cache
> This handles the issues of not caching pages if the visitor is logged
> in, or if there is a query string.  None of the expanded functionality
> matters in your case, but it shows the important lines from the
> standard publication-sitemap.xmap.  See the "Check Cache" and "Create
> Cache" commented sections.
> map:read is easy.  The WriteSourceTransformer is more complicated, and
> is documented at:
> http://cocoon.apache.org/2.1/userdocs/sourcewriting-transformer.html
> You may want to change the cache directory.  You may need custom
> addSourceTags.xsl and removeSourceTags.xsl.  Or maybe it will just
> work.
> ---
> #3 may require thought.  I have not used Lenya's Scheduler; my few
> attempts did not work, and I did not put much effort into it.  Maybe
> someone else can assist with it.
> #3 can be solved easily with a cron job that just deletes the files
> from the cache (assuming you are using a real operating system.)  That
> should take almost 30 seconds for a shell programmer.  If the files
> are deleted, then the "Check Cache" code fails, and "Create Cache" is
> called.
> solprovider
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
> For additional commands, e-mail: user-help@lenya.apache.org

To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org

View raw message