lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams (JIRA)" <>
Subject [jira] Commented: (LUCENE-600) ParallelWriter companion to ParallelReader
Date Mon, 31 Aug 2009 23:38:32 GMT


Chuck Williams commented on LUCENE-600:

A given logical Document must have the same doc-id in each subindex, which is maintained by
using a merge policy that guarantees consistency across the subindexes, either merge-by-count
or merge-by-size as dictated by the size-dominant subindex.

I just read your wiki page and it looks like your MasterMergePolicy is the same for the merge-by-size
case, right?

We've bee using parallel incremental indexing in production apps now for a long time, along
with the efficient update mechanism described in the patent app.

The original company I did this work for was acquired by a larger company who now owns the
IP.  I don't know how they would feel about a contribution of the latest version of ParallelWriter,
which works with the current Lucene.  I could inquire if you are truly open to it, but it
sounds like you may be on your own path to a quite similar thing.

Your wiki page says, "when you need to reindex this field you can simply create a new generation
of this parallel index and fill it with the new values".  That is the rub of the problem,
and the one we created an efficient algorithm and implementation for several years ago.  ParallelWriter
is the easy part.

> ParallelWriter companion to ParallelReader
> ------------------------------------------
>                 Key: LUCENE-600
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Chuck Williams
>            Priority: Minor
>         Attachments: ParallelWriter.patch
> A new class ParallelWriter is provided that serves as a companion to ParallelReader.
 ParallelWriter meets all of the doc-id synchronization requirements of ParallelReader, subject
>     1.  ParallelWriter.addDocument() is synchronized, which might have an adverse effect
on performance.  The writes to the sub-indexes are, however, done in parallel.
>     2.  The application must ensure that the ParallelReader is never reopened inside
ParallelWriter.addDocument(), else it might find the sub-indexes out of sync.
>     3.  The application must deal with recovery from ParallelWriter.addDocument() exceptions.
 Recovery must restore the synchronization of doc-ids, e.g. by deleting any trailing document(s)
in one sub-index that were not successfully added to all sub-indexes, and then optimizing
all sub-indexes.
> A new interface, Writable, is provided to abstract IndexWriter and ParallelWriter.  This
is in the same spirit as the existing Searchable and Fieldable classes.
> This implementation uses java 1.5.  The patch applies against today's svn head.  All
tests pass, including the new TestParallelWriter.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message