lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Karman <>
Subject Re: [lucy-dev] On Transactionality and Performance
Date Thu, 24 Mar 2011 01:47:46 GMT
David E. Wheeler wrote on 3/23/11 8:01 PM:

> Okay. So how expensive is it, really, to create a new indexer for each
> distribution I index, rather than for all those being indexed in a session?
> Or is there READ access for searchers while an indexer is indexing stuff?

The index is definitely available for searching while the indexer is doing its
work. The searcher will become stale though, as soon as the $indexer->commit()
is called, and the existing searcher will not have access to the recently-added

Here, for example, is how I manage searchers:

Note the get_ks() method, which tracks a UUID per index and re-opens a new
searcher whenever the UUID changes.

Marvin's comments about the efficiency of indexers and the advantage of
"batching up" your indexed documents is merely that: an advantage and an efficiency.

In my pipeline, I have separate processes that serialize my incoming data
(analogous to unpacking .tar files and converting/normalizing their contents
into something index-able) and the indexers that actually parse/tokenize/insert
those documents. It's up to the searcher(s) (in my case) to detect whether they
should refresh themselves.

Peter Karman  .  .

View raw message