lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1879) Parallel incremental indexing
Date Fri, 06 Nov 2009 10:17:32 GMT


Michael McCandless commented on LUCENE-1879:

This sounds great!  In fact your proposal for a ParallelSegmentWriter
is just like what I'm picturing -- making the switching "down low"
instead of "up high" (above Lucene).  This'd be more generic than just
the postings files, since all index files can be separately written.

It'd then a low-level question of whether ParallelSegmentWriter stores
its files in different Directories, or, a single directory with
different file names (or maybe sub-directories within a directory, or,
something else).  It could even use FileSwitchDirectory, eg to direct
certain segment files to an SSD (another way to achieve your example).

This should also fit well into LUCENE-1458 (flexible indexing) -- one
of the added test cases there creates a per-field codec wrapper that
lets you use a different codec per field.  Right now, this means
separate file names in the same Directory for that segment, but we
could allow the codecs to use different Directories (or, FSD as well)
if they wanted to.

Different SegmentWriter implementations will allow you to write single
segments in different ways, e.g. doc-at-a-time (the default one with
addDocument()) or term-at-a-time (like addIndexes*() works).

Can you elaborate on this?  How is addIndexes* term-at-a-time?

If we allow (re)writing segments in both dimensions I think we will
create a more flexible approach which is independent on what data
structures we add to Lucene

Dimension 1 is the docs, and dimension 2 is the assignment of fields
into separate partitions?

> Parallel incremental indexing
> -----------------------------
>                 Key: LUCENE-1879
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>             Fix For: 3.1
>         Attachments: parallel_incremental_indexing.tar
> A new feature that allows building parallel indexes and keeping them in sync on a docID
level, independent of the choice of the MergePolicy/MergeScheduler.
> Find details on the wiki page for this feature:
> Discussion on java-dev:

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message