lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen" <jason.rutherg...@gmail.com>
Subject Re: Realtime Search
Date Wed, 24 Dec 2008 18:23:48 GMT
> Also, what are the requirements?  Must a document be visible to search
within 10ms of being added?

0-5ms.  Otherwise it's not realtime, it's batch indexing.  The realtime
system can support small batches by encoding them into RAMDirectories if
they are of sufficient size.

> Or must it be visible to search from the time that the call to add it
returns?

Most people probably expect the update latency offered by SQL databases.

> As a baseline, how fast is it to simply use RAMDirectory?

It depends on how fast searches over the realtime index need to be.  The
detriment to speed occurs with having many small segments that are
continuously decoded (terms, postings, etc).  The advantage of MemoryIndex
and InstantiatedIndex is an actual increase in search speed compared with
RAMDirectory (see the Performance Notes at
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/index/memory/MemoryIndex.htmland
)and no need to continuously decode segments that are short lived.

Anecdotal tests indicated the merging overhead of using RAMDirectory as
compared with MI or II is significant enough to make it only useful for
doing batches in the 1000s which does not seem to be what people expect from
realtime search.

On Wed, Dec 24, 2008 at 9:53 AM, Doug Cutting <cutting@apache.org> wrote:

> Jason Rutherglen wrote:
>
>> 2) Implement realtime search by incrementally creating and merging readers
>> in memory.  The system would use MemoryIndex or InstantiatedIndex to quickly
>> (more quickly than RAMDirectory) create indexes from added documents.
>>
>
> As a baseline, how fast is it to simply use RAMDirectory?  If one, e.g.,
> flushes changes every 10ms or so, and has a background thread that uses
> IndexReader.reopen() to keep a fresh version for reading?
>
> Also, what are the requirements?  Must a document be visible to search
> within 10ms of being added?  Or must it be visible to search from the time
> that the call to add it returns?  In the latter case one might still use an
> approach like the above.  Writing a small new segment to a RAMDirectory and
> then, with no merging, calling IndexReader.reopen(), should be quite fast.
>  All merging could be done in the background, as should post-merge reopens()
> that involve large segments.
>
> In short, I wonder if new reader and writer implementations are in fact
> required or whether, perhaps with a few optimizations, the existing
> implementations might meet this need.
>
> Doug
>
> ---------------------------------------------------------------------
>
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message