lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wolfgang Hoschek <wolfgang.hosc...@mac.com>
Subject Re: MemoryIndex
Date Tue, 02 May 2006 17:33:40 GMT
MemoryIndex was designed to maximize performance for a specific use  
case: pure in-memory datastructure, at most one document per  
MemoryIndex instance, any number of fields, high frequency reads,  
high frequency index writes, no thread-safety required, optional  
support for storing offsets.

I briefly considered extending it to the multi-document case, but  
eventually refrained from doing so, because I didn't really need such  
functionality myself (no itch). Here are some issues to consider when  
attempting such an extension:

- The internal datastructure would probably look quite different
- Datastructure/algorithmic trade-offs regarding time vs space, read  
vs. write frequency, common vs. less common use cases
- Hence, it may well turn out that there's not much to reuse.
- A priori, it isn't clear whether a new solution would be  
significantly faster than normal RAMDirectory usage. Thus...
- Need benchmark suite to evaluate the chosen trade-offs.
- Need tests to ensure correctness (in practise, meaning, it behaves  
just like the existing alternative).

I'd say it's a non-trival untertaking. For example, right now, I  
don't have time for such an effort. That doesn't mean it's impossible  
or shouldn't be done, of course. If someone would like to run with it  
that would be great, but in light of the above issues, I'd suggest  
doing it in a new class (say MultiMemoryIndex or similar).

I believe Mark has dome some initial work in that direction, based on  
an independent (and different) implementation strategy.

Wolfgang.

On May 2, 2006, at 12:25 AM, Robert Engels wrote:

> Along the lines of Lucene-550, what about having a MemoryIndex that  
> accepts
> multiple documents, then wrote the index once at the end in the  
> Lucene file
> format (so it could be merged) during close.
>
> When adding documents using an IndexWriter, a new segment is  
> created for
> each document, and then the segments are periodically merged in  
> memory,
> and/or with disk segments. It seems that when constructing an Index or
> updating a "lot" of documents in an existing index, the write,  
> read, merge
> cycle is inefficient, and if the documents/field information were  
> maintained
> in order (TreeMaps) greater efficiency would be realized.
>
> With a memory index, the memory needed during update will increase
> dramatically, but this could still be bounded, and a "disk based"  
> index
> segment written when too many documents are in the memory index (max
> buffered documents).
>
> Does this "sound" like an improvement? Has anyone else tried  
> something like
> this?
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message