lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2881) Track FieldInfo per segment instead of per-IW-session
Date Fri, 18 Mar 2011 17:30:30 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008530#comment-13008530
] 

Michael McCandless commented on LUCENE-2881:
--------------------------------------------

Patch looks much better -- I think it's close!  Comments:

  * I think we should put header (id + version, ie
    CodecUtil.write/readHeader) on fnx file?

  * When does FieldNumberBiMap init from another...?  (We have ctor
    that takes a FieldNumberBiMap in... but I don't see who uses
    that).

  * Who uses BiMap.entries()?  Looks like just for testing?  Can you
    add comment saying "just for testing.."?

  * FieldInfos ctor that loads from index file name but makes a new
    bimap seems spooky...?  Is this only used by tests now...?

  * The passed in "version" to SIS.readGlobalFieldMap is unused

  * I'm a little worried that we name the new file _X.fnx, because it
    will appear that this file 'belongs' to segment X, which is
    dangerous because in some recovery cases we will remove all
    files associated w/ a given segment (ie, _X.*).  Maybe, we can
    name it without the leading _?  Ie, 0.fnx, 1.fnx, etc.?

  * In IW.getGlobalFieldNumberMap... shouldn't that "legacy" logic be
    moved inside SegmentInfos?  Ie it's a bit weird now how we first
    pull an (empty) biMap from the legacy SegmentInfos then we go and
    populate it ourselves...?  Then we don't need this method in IW
    (we just ask the SIS for the bimap).

  * Should FieldInfo.putInternal(FieldInfo) assert that the global
    bimap "matches"?  (Ie, it makes no effort to update the global
    bimap).

  * The addition of "si.hasProx = hasProx" in SegmentInfo.clone isn't
    needed; we pass hasProx to the ctor of that si.

  * Why do we default SegmentInfos.format now...?  Seems spooky?

  * In SegmentInfos.rollbackCommit shouldn't we set the
    pendingMapVersion to -1?

Should we backport this to 3.x (after sufficient aging)?


> Track FieldInfo per segment instead of per-IW-session
> -----------------------------------------------------
>
>                 Key: LUCENE-2881
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2881
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: Realtime Branch, CSF branch, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: Realtime Branch, CSF branch, 4.0
>
>         Attachments: LUCENE-2881.patch, LUCENE-2881.patch, LUCENE-2881.patch, lucene-2881.patch,
lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, lucene-2881.patch
>
>
> Currently FieldInfo is tracked per IW session to guarantee consistent global field-naming
/ ordering. IW carries FI instances over from previous segments which also carries over field
properties like isIndexed etc. While having consistent field ordering per IW session appears
to be important due to bulk merging stored fields etc. carrying over other properties might
become problematic with Lucene's Codec support.  Codecs that rely on consistent properties
in FI will fail if FI properties are carried over.
> The DocValuesCodec (DocValuesBranch) for instance writes files per segment and field
(using the field id within the file name). Yet, if a segment has no DocValues indexed in a
particular segment but a previous segment in the same IW session had DocValues, FieldInfo#docValues
will be true  since those values are reused from previous segments. 
> We already work around this "limitation" in SegmentInfo with properties like hasVectors
or hasProx which is really something we should manage per Codec & Segment. Ideally FieldInfo
would be managed per Segment and Codec such that its properties are valid per segment. It
also seems to be necessary to bind FieldInfoS to SegmentInfo logically since its really just
per segment metadata.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message