lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-600) ParallelWriter companion to ParallelReader
Date Mon, 31 Aug 2009 22:23:33 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749656#action_12749656
] 

Chuck Williams commented on LUCENE-600:
---------------------------------------

The version attached here is from over 3 years ago.  Our version has evolved along with Lucene
and the whole apparatus is fully functional with the latest lucene.

The fields in each subindex are disjoint.  A logical Document is the collection of all fields
from each real Document in each real subindex with same doc-id (i.e., the model Doug started
with ParallelReader).  There is no issue with deletion by query or term as it deletes the
whole logical Document.  Field updates in our scheme don't use deletion.

Merge-by-size is only an issue if you allow it to be decided independently in each subindex.
 In practice that is not very important since one subindex is size-dominant (the one containing
the document body field).  One can merge-by-size that subindex and force the others to merge
consistently.

The only reason for the corresponding-segment constraint is that deletion changes doc-id's
by purging deleted documents.  I know some Lucene apps address this by never purging deleted
documents, which is ok in some domains where deletion is rare.  I think there are other ways
to resolve it as well.



> ParallelWriter companion to ParallelReader
> ------------------------------------------
>
>                 Key: LUCENE-600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-600
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Chuck Williams
>            Priority: Minor
>         Attachments: ParallelWriter.patch
>
>
> A new class ParallelWriter is provided that serves as a companion to ParallelReader.
 ParallelWriter meets all of the doc-id synchronization requirements of ParallelReader, subject
to:
>     1.  ParallelWriter.addDocument() is synchronized, which might have an adverse effect
on performance.  The writes to the sub-indexes are, however, done in parallel.
>     2.  The application must ensure that the ParallelReader is never reopened inside
ParallelWriter.addDocument(), else it might find the sub-indexes out of sync.
>     3.  The application must deal with recovery from ParallelWriter.addDocument() exceptions.
 Recovery must restore the synchronization of doc-ids, e.g. by deleting any trailing document(s)
in one sub-index that were not successfully added to all sub-indexes, and then optimizing
all sub-indexes.
> A new interface, Writable, is provided to abstract IndexWriter and ParallelWriter.  This
is in the same spirit as the existing Searchable and Fieldable classes.
> This implementation uses java 1.5.  The patch applies against today's svn head.  All
tests pass, including the new TestParallelWriter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message