cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From giacomo <>
Subject Re: Ant: Re: Adding XML searching with Lucene
Date Sun, 09 Dec 2001 20:15:52 GMT
On Fri, 7 Dec 2001, Stefano Mazzocchi wrote:

> Bernhard Huber wrote:
> > > > I have even thought about that the indexing step may act like the
> > > > profiler. Instead of collecting profile data about how long
> > > something> takes, update, or create the index information. This
> > > way the index is
> > > > kept up-to-date.
> > > > This way no explicit crawling is necessary for the internal docs.
> > >
> > > sorry but I didn't get it.
> > >
> >
> > Let me explain it again:
> > As your wrote later some timed-triggered task assert that
> > the index is kept up-to-date.
> > Now if i want to avoid that, scanning through all documents and
> > checking if they have changed since last index generation, if
> > have to have some other mechanism.
> > One mechanism would be that if a document is requested which
> > is indexed it checks if it is newer than it index.
> > Now the implementation of this mechanim would be a la profiler
> > which alters the SAXConnectors, or Pipeline -- i don't know that
> > by heart exactly.
> > Important seems to me:
> > When and who pays for the indexing, and what is the maximum allowed
> > time for differing document and index?
> >
> > The simple time-triggered indexer is just one solution.
> An internal crawler might connect to the cache information and avoid
> indexing something that is still valid in cache (if it's valid in cache
> and it's already present in the index, then it's valid in the index)
> > Another solution is that a serializer, or transformer of a view
> > writes the index.
> Hmmm,
> > The only problem for the transformer/serializer is
> > to know then to close the IndexWriter if it is creating the index
> > from scratch. Just updating might work inside a transformer/serializer.
> > In that case we still might need some time triggered task removing
> > lucene documents of deleted documents of the site.
> My SoC alarm started ringing: the sitemap components should have no
> notion of indexing. The entire crawling/indexing/searching phase happens
> externally or we'll have concern overlap.

A Serializer is indeed the wrong place (SoC) but an IndexingTransformer
put before the Serializer in a pipe would keep SoC, don't you think?

> But at the same time, it would be nice to have a synchronous way to
> trigger reindexing of recently modified content (say, a page just
> edited). This could be done by calling a specific behavior on the
> 'cocoon' component (which is the engine).

Exactly, synchronous indexing is key to have an (almost) always
up-to-date index for searching.

> Which leads me to think that making crawling, indexing and searching as
> Avalon components might be FS since we're not going to use any other
> implementation of these....

You don't have to use Avalon for multiple implementations only. There
could well be a single implementation for a role. The CM manages its
lifecycle a.s.o and every Composer could have access to it. That alone
can justify make something a Component IMO.


To unsubscribe, e-mail:
For additional commands, email:

View raw message