lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-2881) Track FieldInfo per segment instead of per-IW-session
Date Mon, 21 Mar 2011 13:29:05 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Simon Willnauer updated LUCENE-2881:
------------------------------------

    Attachment: LUCENE-2881.patch

next iteration, I think we are ready to commit here. I added a couple of testcases regarding
the fnx file and made sure they get deleted accordingly even if we fail with Exceptions during
prepareCommit & finishCommit. Moved the BW-Compat code for building the initial global
map to SegmentInfos and cleaned up assertions in FieldInfos (also added the suggested assertion
to FIs#putInternal). To prevent that we miss a fnx file if we open an old index, write the
fnx file and keep that SIS in memory with format set to some old version I removed the check
if we are on 4.0 index but used the latestGlobalFieldNumberVersion which is kept consistent
and only include the file if its not set to 0.

I run whileTrue tests on this patch now for a while and things are looking good from my side.
Mike if you have time let beast chew it again. If its fine I will commit tomorrow.

> Track FieldInfo per segment instead of per-IW-session
> -----------------------------------------------------
>
>                 Key: LUCENE-2881
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2881
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: Realtime Branch, CSF branch, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: Realtime Branch, CSF branch, 4.0
>
>         Attachments: LUCENE-2881.patch, LUCENE-2881.patch, LUCENE-2881.patch, LUCENE-2881.patch,
lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, lucene-2881.patch
>
>
> Currently FieldInfo is tracked per IW session to guarantee consistent global field-naming
/ ordering. IW carries FI instances over from previous segments which also carries over field
properties like isIndexed etc. While having consistent field ordering per IW session appears
to be important due to bulk merging stored fields etc. carrying over other properties might
become problematic with Lucene's Codec support.  Codecs that rely on consistent properties
in FI will fail if FI properties are carried over.
> The DocValuesCodec (DocValuesBranch) for instance writes files per segment and field
(using the field id within the file name). Yet, if a segment has no DocValues indexed in a
particular segment but a previous segment in the same IW session had DocValues, FieldInfo#docValues
will be true  since those values are reused from previous segments. 
> We already work around this "limitation" in SegmentInfo with properties like hasVectors
or hasProx which is really something we should manage per Codec & Segment. Ideally FieldInfo
would be managed per Segment and Codec such that its properties are valid per segment. It
also seems to be necessary to bind FieldInfoS to SegmentInfo logically since its really just
per segment metadata.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message