lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1879) Parallel incremental indexing
Date Fri, 26 Mar 2010 19:22:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850313#action_12850313
] 

Shai Erera commented on LUCENE-1879:
------------------------------------

The way I planned to support multi-threaded indexing is to do a two-phase addDocument. First,
allocate a doc ID from DocumentsWriter (synchronized) and then add the Document to each Slice
with that doc ID. DocumentsWriter was not suppose to know it is a parallel index ... something
like the following.
{code}
int docId = obtainDocId();
for (IndexWriter slice : slices) {
  slice.addDocument(docId, Document);
}
{code}

That allows ParallelWriter to be really an orchestrator/manager of all slices, while each
slice can be an IW on its own.

Now, when you say ParallelDocumentsWriter, I assume you mean that that DocWriter will be aware
of the slices? That I think is an interesting idea, which is unrelated to LUCENE-2324. I.e.,
ParallelWriter will invoke its addDocument code which will get down to ParallelDocumentWriter,
which will allocate the doc ID itself and call each slice's DocWriter.addDocument? And then
LUCENE-2324 will just improve the performance of that process?

This might require a bigger change to IW then I had anticipated, but perhaps it's worth it.

What do you think?

> Parallel incremental indexing
> -----------------------------
>
>                 Key: LUCENE-1879
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1879
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>             Fix For: 3.1
>
>         Attachments: parallel_incremental_indexing.tar
>
>
> A new feature that allows building parallel indexes and keeping them in sync on a docID
level, independent of the choice of the MergePolicy/MergeScheduler.
> Find details on the wiki page for this feature:
> http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing 
> Discussion on java-dev:
> http://markmail.org/thread/ql3oxzkob7aqf3jd

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message