lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-2026) Refactoring of IndexWriter
Date Sat, 12 Dec 2009 11:31:18 GMT


Michael McCandless commented on LUCENE-2026:

bq. Zoie is a completely user-land solution which modifies no IW/IR internals and yet achieves
millisecond index-to-query-visibility turnaround while keeping speedy indexing and query performance.
It just keeps the RAMDir outside encapsulated in an object (an IndexingSystem) which has IndexReaders
built off of both the RAMDir and the FSDir, and hides the implementation details (in fact
the IW itself) from the user.

Right, one can always not use NRT and build their own layers on top.

But, Zoie has *alot* of code to accomplish this -- the devil really is
in the details to "simply write first to a RAMDir".  This is why I'd
like Earwin to look @ Zoie and clarify his proposed approach, in

Actually, here's a question: how quickly can Zoie turn around a
commit()?  Seems like it must take more time than Lucene, since it does
extra stuff (flush RAM buffers to disk, materialize deletes) before
even calling IW.commit.

At the end of the day, any NRT system has to trade safety for
performance (bypass the sync call in the NRT reader)....

bq. The API for this kind of thing doesn't have to be tightly coupled, and I would agree with
you that it shouldn't be.

I don't consider NRT today to be a tight coupling (eg, the pending
refactoring of IW would nicely separate it out).  If we implement the
IR that searches DW's RAM buffer, then I'd agree ;)

> Refactoring of IndexWriter
> --------------------------
>                 Key: LUCENE-2026
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: 3.1
> I've been thinking for a while about refactoring the IndexWriter into
> two main components.
> One could be called a SegmentWriter and as the
> name says its job would be to write one particular index segment. The
> default one just as today will provide methods to add documents and
> flushes when its buffer is full.
> Other SegmentWriter implementations would do things like e.g. appending or
> copying external segments [what addIndexes*() currently does].
> The second component's job would it be to manage writing the segments
> file and merging/deleting segments. It would know about
> DeletionPolicy, MergePolicy and MergeScheduler. Ideally it would
> provide hooks that allow users to manage external data structures and
> keep them in sync with Lucene's data during segment merges.
> API wise there are things we have to figure out, such as where the
> updateDocument() method would fit in, because its deletion part
> affects all segments, whereas the new document is only being added to
> the new segment.
> Of course these should be lower level APIs for things like parallel
> indexing and related use cases. That's why we should still provide
> easy to use APIs like today for people who don't need to care about
> per-segment ops during indexing. So the current IndexWriter could
> probably keeps most of its APIs and delegate to the new classes.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message