lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eks Dev (JIRA)" <>
Subject [jira] [Commented] (LUCENE-1879) Parallel incremental indexing
Date Mon, 01 Aug 2011 08:47:09 GMT


Eks Dev commented on LUCENE-1879:

The user mentioned above in comment was me, I guess. Commenting here just to add interesting
use case that would be perfectly solved by this issue.  

Imagine solr Master - Slave setup, full document contains CONTENT and ID fields, e.g. 200Mio+
collection. On master, we need field ID indexed in order to process delete/update commands.
On slave, we do not need lookup on ID and would like to keep our TermsDictionary small, without
exploding TermsDictionary with 200Mio+ unique ID terms (ouch, this is a lot compared to 5Mio
unique terms in CONTENT, with or without pulsing). 

With this issue,  this could be nativly achieved by modifying solr UpdateHandler not to transfer
"ID-Index" to slaves at all.

There are other ways to fix it, but this would be the best.(I am currently investigating an
option to transfer full index on update, but to filter-out TermsDictionary on IndexReader
level (it remains on disk, but this part never gets accessed on slaves). I do not know yet
if this is possible at all in general , e.g. FST based term dictionary is already built (prefix
compressed TermDict would be doable)

> Parallel incremental indexing
> -----------------------------
>                 Key: LUCENE-1879
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: core/index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>             Fix For: 4.0
>         Attachments: parallel_incremental_indexing.tar
> A new feature that allows building parallel indexes and keeping them in sync on a docID
level, independent of the choice of the MergePolicy/MergeScheduler.
> Find details on the wiki page for this feature:
> Discussion on java-dev:

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message