lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <>
Subject Re: Realtime & distributed
Date Fri, 09 Oct 2009 03:42:44 GMT
On Thu, Oct 8, 2009 at 7:56 PM, Jason Rutherglen <
> wrote:

> There is the Zoie system which uses the RAMDir
> solution,

Also, to clarify: zoie does not index into a RAMDir and then periodically
merge that
down to disk, as for one thing, this has a bad failure mode when the system
as you lose the entire RAMDir and have to figure out how far back to look in
transaction log to know how much to reindex.

Zoie instead indexes "redundantly": every incoming document is indexed into
RAMDir *and* the FSDirectory simultaneously, but the disk IndexReader for
FSDirectory is only reopened every 15 minutes or so, while the IndexReader
the RAMDirectory is reopened for every query to guarantee real-timeliness of
the index.

The only case where zoie *isn't* realtime, is when the speed of indexing
comes in faster than can be indexed into the RAMDirectory - if this is the
those updates will pile up in a queue being served by that indexing thread,
won't be visible until that thread has caught up.  In practice, this doesn't
unless any given node is trying to index a hundred documents (depends on
of course) a second.

Of course, since the IndexWriter buffers some documents in RAM before
flushing to disk, you are not totally immune to system failures, but at zoie
no more susceptible to that then non-realtime search, as it's writing
directly to
disk all the time as well (and yes, this is redundant, but ever since the
indexing speed improvements of Lucene 2.3, I've yet to see indexing be the
bottleneck anymore).


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message