cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bernhard Huber" <>
Subject Ant: Re: Adding XML searching with Lucene
Date Thu, 06 Dec 2001 16:53:01 GMT
nice to hear that it works now.

> > Yes, separating is quite a good idea. It will speed up the 
> indexing of
> > the local sites deployed in
> > the same servlet engine.
> same cocoon, you mean.
yes, of cource i was mixing this up with some jsp i wrote
which access the html pages via the servletContext,
getRealPath()/getResource(), but that's not possible with cocoon because
of the sitemap.....

> > I have even thought about that the indexing step may act like the
> > profiler. Instead of collecting profile data about how long 
> something> takes, update, or create the index information. This 
> way the index is
> > kept up-to-date.
> > This way no explicit crawling is necessary for the internal docs.
> sorry but I didn't get it.
Let me explain it again:
As your wrote later some timed-triggered task assert that
the index is kept up-to-date. 
Now if i want to avoid that, scanning through all documents and
checking if they have changed since last index generation, if
have to have some other mechanism.
One mechanism would be that if a document is requested which
is indexed it checks if it is newer than it index.
Now the implementation of this mechanim would be a la profiler
which alters the SAXConnectors, or Pipeline -- i don't know that
by heart exactly. 
Important seems to me:
When and who pays for the indexing, and what is the maximum allowed
time for differing document and index?

The simple time-triggered indexer is just one solution.

Another solution is that a serializer, or transformer of a view
writes the index. The only problem for the transformer/serializer is
to know then to close the IndexWriter if it is creating the index
from scratch. Just updating might work inside a transformer/serializer.
In that case we still might need some time triggered task removing 
lucene documents of deleted documents of the site.

Well, that's all for now. 

by bernhard

Huber Bernhard, email:, homepage:

To unsubscribe, e-mail:
For additional commands, email:

View raw message