lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <>
Subject [jira] Commented: (LUCENE-2881) Track FieldInfo per segment instead of per-IW-session
Date Mon, 28 Feb 2011 22:40:37 GMT


Simon Willnauer commented on LUCENE-2881:

For the record, robert reverted the changes made by this issue since we have been experiencing
a fair bit of [problems|]

eventually reproducible with:
ant test -Dtestcase=SimpleFacetsTest -Dtestmethod=testFacetSingleValued -Dtests.seed=-4971136915249645135:5200209917417531291

ant test -Dtestcase=SimpleFacetsTest -Dtestmethod=testFacetSingleValuedFcs -Dtests.seed=-4971136915249645135:-3738166620811568832

ant test -Dtestcase=SimpleFacetsTest -Dtestmethod=testFacetPrefixMultiValued -Dtests.seed=-4971136915249645135:4594369826150277150

ant test -Dtestcase=SimpleFacetsTest -Dtestmethod=testFacetPrefixSingleValued -Dtests.seed=-4971136915249645135:-7702531001769827248

ant test -Dtestcase=SimpleFacetsTest -Dtestmethod=testFacetPrefixSingleValuedFcs -Dtests.seed=-4971136915249645135:698398490325732548


I found the problem causing this where certain field numbers got mixed up when the FieldInfos
get build initially in IndexWriter and a segment is loaded first which had gaps in its field
FieldInfos is ignoring the FieldInfo's number if the FieldInfo does not exist yet and tries
to assigne a new "local" field number. But if the next available field number x while the
actual FI's number was > x+1 the new added FI will be set to x instead.

in other words, lets say we have 2 segments:
 seg1 : { fields : [(a:0, c:2)] } 
 seg2 : { fields : [(a:0, b:1, c:2)] } 
if we load seg1's FI we end up with 

{code}fields : [(a:0, c:1)] {code}

then we add seg2's FI's and end up with 

{code}fields : [(a:0, c:1, b:2)] {code}

this will also explain the TestNRTThreads.testNRTThreads failure where bulkMerge could not
be applied due to different field numbers across segments.

I will upload a patch tomorrow.

> Track FieldInfo per segment instead of per-IW-session
> -----------------------------------------------------
>                 Key: LUCENE-2881
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: Realtime Branch, CSF branch, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Michael Busch
>             Fix For: Realtime Branch, CSF branch, 4.0
>         Attachments: lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, lucene-2881.patch,
> Currently FieldInfo is tracked per IW session to guarantee consistent global field-naming
/ ordering. IW carries FI instances over from previous segments which also carries over field
properties like isIndexed etc. While having consistent field ordering per IW session appears
to be important due to bulk merging stored fields etc. carrying over other properties might
become problematic with Lucene's Codec support.  Codecs that rely on consistent properties
in FI will fail if FI properties are carried over.
> The DocValuesCodec (DocValuesBranch) for instance writes files per segment and field
(using the field id within the file name). Yet, if a segment has no DocValues indexed in a
particular segment but a previous segment in the same IW session had DocValues, FieldInfo#docValues
will be true  since those values are reused from previous segments. 
> We already work around this "limitation" in SegmentInfo with properties like hasVectors
or hasProx which is really something we should manage per Codec & Segment. Ideally FieldInfo
would be managed per Segment and Codec such that its properties are valid per segment. It
also seems to be necessary to bind FieldInfoS to SegmentInfo logically since its really just
per segment metadata.  

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message