cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: Ant: Re: Adding XML searching with Lucene
Date Fri, 07 Dec 2001 10:12:57 GMT
Bernhard Huber wrote:

> > > I have even thought about that the indexing step may act like the
> > > profiler. Instead of collecting profile data about how long
> > something> takes, update, or create the index information. This
> > way the index is
> > > kept up-to-date.
> > > This way no explicit crawling is necessary for the internal docs.
> >
> > sorry but I didn't get it.
> >
>
> Let me explain it again:
> As your wrote later some timed-triggered task assert that
> the index is kept up-to-date.
> Now if i want to avoid that, scanning through all documents and
> checking if they have changed since last index generation, if
> have to have some other mechanism.
> One mechanism would be that if a document is requested which
> is indexed it checks if it is newer than it index.
> Now the implementation of this mechanim would be a la profiler
> which alters the SAXConnectors, or Pipeline -- i don't know that
> by heart exactly.
> Important seems to me:
> When and who pays for the indexing, and what is the maximum allowed
> time for differing document and index?
> 
> The simple time-triggered indexer is just one solution.

An internal crawler might connect to the cache information and avoid
indexing something that is still valid in cache (if it's valid in cache
and it's already present in the index, then it's valid in the index)

> Another solution is that a serializer, or transformer of a view
> writes the index. 

Hmmm,

> The only problem for the transformer/serializer is
> to know then to close the IndexWriter if it is creating the index
> from scratch. Just updating might work inside a transformer/serializer.
> In that case we still might need some time triggered task removing
> lucene documents of deleted documents of the site.

My SoC alarm started ringing: the sitemap components should have no
notion of indexing. The entire crawling/indexing/searching phase happens
externally or we'll have concern overlap.

But at the same time, it would be nice to have a synchronous way to
trigger reindexing of recently modified content (say, a page just
edited). This could be done by calling a specific behavior on the
'cocoon' component (which is the engine).

Which leads me to think that making crawling, indexing and searching as
Avalon components might be FS since we're not going to use any other
implementation of these....

What do you think?

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<stefano@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message