cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Upayavira" ...@upaya.co.uk>
Subject CLI caching, etc (was Re: New error handling)
Date Wed, 16 Apr 2003 17:38:00 GMT
Vadim,

> >>1. Implement setStatus() in AbstractCommandLineEnvironment 
> >>(implementation is empty right now)
> >>2. Add getStatus() to the AbstractCommandLineEnvironment
> >>3. Test getStatus() in the CLI crawling code.
> >>4. Test how it works and fix the broken link :)

Works a treat! Thanks. Although I had to modify the sitemap to give error codes 
(thanks Jeremy for your recent mail!)

> Not will, but does! This was done long time ago (for http), otherwise
> how you will get 404 in the browser? :)

That's kinda what I meant ;-) 

> >Similarly, based upon comments from Nicola Ken ages ago:
> >
> >>>In the Environment there is
> >>>
> >>>    boolean isResponseModified(long lastModified);
> >>>    void setResponseIsNotModified();
> >>>
> >>>But it's never implemented. In AbstractEnvironment:
> >>>
> >>>    public boolean isResponseModified(long lastModified) {
> >>>        return true; // always modified
> >>>    }
> >>>
> >>>    public void setResponseIsNotModified() {
> >>>        // does nothing
> >>>    }
> >
> >Similarly, the setResponseIsNotModified() will be called on the
> >current environment if a response was read from the cache. At
> >present, this method does nothing.

> Before you go further with this... Look at method isResponseModified()
> in [1].
>  
> What you need to do is to:
> 1. Implement method isResponseModified() for command line environment.
> 2. In the CLI, get the file corresponding to the request URI, and get
> its last modification time. 3. Populate environment with this
> modification time (this will be similar to If-Modified-Since date
> header in http). 4. Call cocoon. It will skip generation if response
> is not modified, and won't even read it from cache.

Very interesting. So Cocoon can tell me if something has been modified. Great. 

However, if the Bean is able to send pages to various locations, it might not be able to 
identify when a page was generated without network traffic (e.g when using FTP). 
This would be unfortunate, as a large site could involve a lot of network traffic, and 
the point of this is to avoid that.

I could store locally (in my own hashed up cache) the last modified date for the page 
and the list of links within the page, each time a page is generated. That way, when I 
am about to generate a page, I can easily get its timestamp. If I find that I don't need 
to generate the page, I can use my locally held list of links to follow.

Does this seem reasonable?

And finally, I have got code working to make the CLI use ModifiableSources rather 
than Destination objects. Do you think I need to support the Destination interface still 
(and deprecate it), or can I just delete it entirely?

Once I've got this going, I'll get on with attempting a VFS ModifiableSource (probably 
once I've had a three week holiday in South Africa!).

Thanks again.

Regards, Upayavira



Mime
View raw message