lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1879) Parallel incremental indexing
Date Fri, 26 Mar 2010 19:42:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850322#action_12850322
] 

Grant Ingersoll commented on LUCENE-1879:
-----------------------------------------

First off, I haven't looked at the code here or the comments beyond skimming, but this is
something I've had in my head for a long time, but don't have any code.  When I think about
the whole update problem, I keep coming back to the notion of Photoshop Layers that essentially
mask the underlying part of the photo, w/o damaging it.  The analogy isn't quite the same
here, but nevertheless...

This leads me to wonder if the solution isn't best achieved at the index level and not at
the Reader/Writer level.  

So, thinking out loud here and I'm not sure on the best wording of this:  
when a document first comes in, it is all in one place, just as it is now.  Then, when an
update comes in on a particular field, we somehow mark in the index that the document in question
is modified and then we add the new change onto the end of the index (just like we currently
do when adding new docs, but this time it's just a doc w/ a single field).    Then, when searching,
we would, when scoring the affected documents, go to a secondary process that knew where to
look up the incremental changes.  As background merging takes place, these "disjoint" documents
would be merged back together.  We'd maybe even consider a "high update" merge scheduler that
could more frequently handle these incremental merges.  In a sense, the old field for that
document is masked by the new field.  I think, given proper index structure, that we _maybe_
could make that marking of the old field fast (maybe it's a pointer to the new field, maybe
it's just a bit indicating to go look in the "update" segment)

On the search side, I think performance would still be maintained b/c even in high update
envs. you aren't usually talking about more than a few thousand changes in a minute or two
and the background merger would be responsible for keeping the total number of disjoint documents
low.

> Parallel incremental indexing
> -----------------------------
>
>                 Key: LUCENE-1879
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1879
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>             Fix For: 3.1
>
>         Attachments: parallel_incremental_indexing.tar
>
>
> A new feature that allows building parallel indexes and keeping them in sync on a docID
level, independent of the choice of the MergePolicy/MergeScheduler.
> Find details on the wiki page for this feature:
> http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing 
> Discussion on java-dev:
> http://markmail.org/thread/ql3oxzkob7aqf3jd

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message