lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <>
Subject [jira] Commented: (LUCENE-2881) Track FieldInfo per segment instead of per-IW-session
Date Fri, 18 Mar 2011 18:02:29 GMT


Simon Willnauer commented on LUCENE-2881:

I think we should put header (id + version, ie
CodecUtil.write/readHeader) on fnx file?
man, I told myself to add it about 10 times during that patch :)

bq. When does FieldNumberBiMap init from another...? 
ah legacy

I'm a little worried that we name the new file _X.fnx, because it
will appear that this file 'belongs' to segment X, which is
dangerous because in some recovery cases we will remove all
files associated w/ a given segment (ie, _X.*). Maybe, we can
name it without the leading _? Ie, 0.fnx, 1.fnx, etc.?
right, good catch! that we can simple remove and then it should be clear that its not a file
belonging to a certain segment

bq. In IW.getGlobalFieldNumberMap... shouldn't that "legacy" logic be....
yeah thats is where is should be though. I will move it in.

bq. The addition of "si.hasProx = hasProx" in SegmentInfo.clone isn't...
true I will remove
bq. Why do we default SegmentInfos.format now...? Seems spooky?
this hasn't been used in SIS before so I think it didn't matter before.
Yet, I check the format in files() so if you create the SIS without reading it its set to
0. I can certainly make that work with default to 0 but it seemed just natural to have it
assigned the current_format. I think its fine....

bq. In SegmentInfos.rollbackCommit shouldn't we set the pendingMapVersion to -1
ah good catch! thanks

I will fix those issues and upload another patch. Thanks mike for reviewing!!!!

> Track FieldInfo per segment instead of per-IW-session
> -----------------------------------------------------
>                 Key: LUCENE-2881
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: Realtime Branch, CSF branch, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: Realtime Branch, CSF branch, 4.0
>         Attachments: LUCENE-2881.patch, LUCENE-2881.patch, LUCENE-2881.patch, lucene-2881.patch,
lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, lucene-2881.patch
> Currently FieldInfo is tracked per IW session to guarantee consistent global field-naming
/ ordering. IW carries FI instances over from previous segments which also carries over field
properties like isIndexed etc. While having consistent field ordering per IW session appears
to be important due to bulk merging stored fields etc. carrying over other properties might
become problematic with Lucene's Codec support.  Codecs that rely on consistent properties
in FI will fail if FI properties are carried over.
> The DocValuesCodec (DocValuesBranch) for instance writes files per segment and field
(using the field id within the file name). Yet, if a segment has no DocValues indexed in a
particular segment but a previous segment in the same IW session had DocValues, FieldInfo#docValues
will be true  since those values are reused from previous segments. 
> We already work around this "limitation" in SegmentInfo with properties like hasVectors
or hasProx which is really something we should manage per Codec & Segment. Ideally FieldInfo
would be managed per Segment and Codec such that its properties are valid per segment. It
also seems to be necessary to bind FieldInfoS to SegmentInfo logically since its really just
per segment metadata.  

This message is automatically generated by JIRA.
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message