lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Karman <pe...@peknet.com>
Subject Re: [lucy-dev] On Transactionality and Performance
Date Thu, 24 Mar 2011 01:47:46 GMT
David E. Wheeler wrote on 3/23/11 8:01 PM:

> Okay. So how expensive is it, really, to create a new indexer for each
> distribution I index, rather than for all those being indexed in a session?
> Or is there READ access for searchers while an indexer is indexing stuff?
> 

The index is definitely available for searching while the indexer is doing its
work. The searcher will become stale though, as soon as the $indexer->commit()
is called, and the existing searcher will not have access to the recently-added
segment(s).

Here, for example, is how I manage searchers:
http://cpansearch.perl.org/src/KARMAN/SWISH-Prog-KSx-0.18/lib/SWISH/Prog/KSx/Searcher.pm

Note the get_ks() method, which tracks a UUID per index and re-opens a new
searcher whenever the UUID changes.

Marvin's comments about the efficiency of indexers and the advantage of
"batching up" your indexed documents is merely that: an advantage and an efficiency.

In my pipeline, I have separate processes that serialize my incoming data
(analogous to unpacking .tar files and converting/normalizing their contents
into something index-able) and the indexers that actually parse/tokenize/insert
those documents. It's up to the searcher(s) (in my case) to detect whether they
should refresh themselves.

-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Mime
View raw message