lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eks Dev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-1879) Parallel incremental indexing
Date Mon, 01 Aug 2011 08:47:09 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073462#comment-13073462
] 

Eks Dev commented on LUCENE-1879:
---------------------------------

The user mentioned above in comment was me, I guess. Commenting here just to add interesting
use case that would be perfectly solved by this issue.  

Imagine solr Master - Slave setup, full document contains CONTENT and ID fields, e.g. 200Mio+
collection. On master, we need field ID indexed in order to process delete/update commands.
On slave, we do not need lookup on ID and would like to keep our TermsDictionary small, without
exploding TermsDictionary with 200Mio+ unique ID terms (ouch, this is a lot compared to 5Mio
unique terms in CONTENT, with or without pulsing). 

With this issue,  this could be nativly achieved by modifying solr UpdateHandler not to transfer
"ID-Index" to slaves at all.

There are other ways to fix it, but this would be the best.(I am currently investigating an
option to transfer full index on update, but to filter-out TermsDictionary on IndexReader
level (it remains on disk, but this part never gets accessed on slaves). I do not know yet
if this is possible at all in general , e.g. FST based term dictionary is already built (prefix
compressed TermDict would be doable)

> Parallel incremental indexing
> -----------------------------
>
>                 Key: LUCENE-1879
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1879
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: core/index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>             Fix For: 4.0
>
>         Attachments: parallel_incremental_indexing.tar
>
>
> A new feature that allows building parallel indexes and keeping them in sync on a docID
level, independent of the choice of the MergePolicy/MergeScheduler.
> Find details on the wiki page for this feature:
> http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing 
> Discussion on java-dev:
> http://markmail.org/thread/ql3oxzkob7aqf3jd

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message