lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Busch (JIRA)" <>
Subject [jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments
Date Mon, 20 Dec 2010 19:13:02 GMT


Michael Busch commented on LUCENE-2814:

bq.  Sorry

No worries - I'm being overly dramatic :)

bq.  Is there any way I can help?

Let me try to get it to compile, and then I'll commit.  I'm sure a bunch of tests will fail,
help would be great then.  Also a general review of my changes to IndexWriter/DocumentsWriter/DWPT
would be great.  I should be able to commit my merge by end of today. 

> stop writing shared doc stores across segments
> ----------------------------------------------
>                 Key: LUCENE-2814
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 3.1, 4.0
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch,
> Shared doc stores enables the files for stored fields and term vectors to be shared across
multiple segments.  We've had this optimization since 2.1 I think.
> It works best against a new index, where you open an IW, add lots of docs, and then close
it.  In that case all of the written segments will reference slices a single shared doc store
> This was a good optimization because it means we never need to merge these files.  But,
when you open another IW on that index, it writes a new set of doc stores, and then whenever
merges take place across doc stores, they must now be merged.
> However, since we switched to shared doc stores, there have been two optimizations for
merging the stores.  First, we now bulk-copy the bytes in these files if the field name/number
assignment is "congruent".  Second, we now force congruent field name/number mapping in IndexWriter.
 This means this optimization is much less potent than it used to be.
> Furthermore, the optimization adds *a lot* of hair to IndexWriter/DocumentsWriter; this
has been the source of sneaky bugs over time, and causes odd behavior like a merge possibly
forcing a flush when it starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent
flushing, we can no longer share doc stores.
> So, I think we should turn off the write-side of shared doc stores to pave the path for
DWPT to land on trunk and simplify IW/DW.  We still must support reading them (until 5.0),
but the read side is far less hairy.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message