forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Upayavira>
Subject Re: CLI Caching, etc
Date Mon, 18 Aug 2003 13:24:04 GMT
Jeff Turner wrote:

>On Sun, Aug 17, 2003 at 04:41:33PM +0100, Upayavira wrote:
>>Just to keep you up to date with my work on the CLI:
>>I found a bug which meant that Cocoon wasn't holding onto its cache each 
>>time it shut down and restarted, which explained why the CLI wasn't 
>>using its cache. Vadim fixed the bug.
>>So, the CLI can now read out of the cache correctly. However, it seems 
>>that the cache is either slightly slower than page generation, or much 
>>the same, so there's no real benefit of this bug fix, at least in this area.
>It would be interesting to know how much of the pipeline is actually
>cached.  The times for a first and second Forrest run are 2:36 and 2:39
>respectively, and they're suspiciously similar; as if the cache is being
>checked but not used.  For instance, rendering site.pdf takes 20s on
>first and second rendering.  The timestamps are very useful, btw!
 From my stepping through the code so far, I can see that stuff is got 
out of the cache correctly - store.get() returns something each time. So 
I haven't yet worked out why it takes so long. But I will - someday :-(

>>Also, because pages now come from the cache, pipelines aren't processed, 
>>and the LinkGatherer component no longer works, so we only have LinkView 
>>gathering for following links :-(
>Hmm.. tricky.  If the LinkGatherer output is a byproduct of running the
>pipeline, and the pipeline output is cached, then perhaps the
>LinkGatherer output should also be cached?
Yes, that's what I'd like. But getting it cached is still a little 
beyond my level, and involves hacking around in places I feel a bit 
uncomfortable. Again, I'll get there though.

>>The one benefit of this is that it is now easy to identify whether a 
>>page came out of the cache, if it did, to compare the timestamp of the 
>>file on disc with the timestamp of the cached element, and only save to 
>>disc if the cached element is newer. So, we haven't yet speeded things 
>>up, but we have got it to only update changed files.
>I don't really understand this.  Surely if site.pdf takes 20s on first
>and second rendering, it's updating an unchanged file?
Because you actually have to generate the page (at least get it out of 
the cache) in order to work out whether it has changed. So the time 
taken is the same, but the file is not written. It might be possible to 
just get the timestamp out of the cache, which would be quicker. This 
could benefit the servlet too - pass the last modified date in the 
environment and let the caching pipeline first check whether the page 
has changed before retrieving the whole page. But still - a bit beyond 
me right now.

>>So, if you are happy with link view (for the moment), and like the idea 
>>of only updating pages that have changed, then update to CVS Cocoon. 
>>Otherwise, stick with the one you've got.
>Oh well, link-view lets us filter out unwanted links, even if it's really
>the user-agent's job (you convinced me;), so I'm happy with CVS.

>Oh, mind if I make one change to the output?  Instead of having the time
>on a separate line:
>* [0] document-v12.pdf
>         [1.356 seconds]
>* [38] community/howto/index.html
>         [0.524 seconds]
>* [0] community/howto/index.pdf
>         [0.262 seconds]
>* [0] /favicon.ico
>         [0.052 seconds]
>Have the times right-indented:
>* [0] document-v12.pdf                  [1.356 seconds]
>* [38] community/howto/index.html       [0.524 seconds]
>* [0] community/howto/index.pdf         [0.262 seconds]
>* [0] /favicon.ico                      [0.052 seconds]
>It saves lots of screen bandwidth, and makes the output more parseable.
On my first version, that's how I had it (without the justification 
though). But the version in CVS is a System.out.println hack that works 
around the BeanListener code. When I get to improving the bean listener 
code, I'll add a way for the bean to report back, and then the bean 
listener implementation can display it however it likes.

>>I would be interested in your comments upon this mixed set of consequences.
>>I will try to get linkGathering working again, but it does involve 
>>digging a bit further than I'm used to.
>In true open-source fashion, we'll be here in our armchairs cheering you
>on ;)  I saw you commit some CLI refactorings - does that mean the code
>stable enough yet for bystanders to start poking & trying to understand
I would say the code is stable, with the provisos we've discussed. But I 
do have further improvements planned over the next few weeks (relevant 
particularly to you: an Ant task, an <exclude> config element, and the 
ability to use link view without rewriting filenames).

Regards, Upayavira

View raw message