lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <>
Subject [jira] Commented: (LUCENE-2881) Track FieldInfo per segment instead of per-IW-session
Date Wed, 02 Mar 2011 18:44:37 GMT


Simon Willnauer commented on LUCENE-2881:

Unfortunately, I'm still seeing the assert trip (that there are no non-bulk merges) in TestNRTThreads.
Took beast a while to repro but eventually it did...

this is a problem we have seen before. This is an exception where we write a single doc segment
and fail inverting the doc after FI has been updated. Since hasVectors has been moved to FI
out of SI this is inconsistent and the TV tries to open the files since hasVectors is true.

This is also why buschmi added clearVectors (just as a workaround afaik)

bq. Also, I think we should understand why Solr's SimpleFacetsTest failed from the previous
patch and hopefully make a standalone test showing the problem (and that we fixed it!).

man I still try to trigger it in isolation but I can spend too much time on that right now.
Got sucked into Tiered Flushing ;)

About the other comments (the big bulletpoint list) - in the meanwhile I think we should split
this up in several smaller issues. 
 * start with resetting the FI after flush to make the flags consistent for each segment
 * move FIs into SI with hasVectors etc still in SI
 * factor hasVectors out of SI into FI
 * introduce a global field num map (maybe on realtime first?) and / or store global map
 * ...
We can still use this patch as a PoC does that make sense?

> Track FieldInfo per segment instead of per-IW-session
> -----------------------------------------------------
>                 Key: LUCENE-2881
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: Realtime Branch, CSF branch, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Michael Busch
>             Fix For: Realtime Branch, CSF branch, 4.0
>         Attachments: LUCENE-2881.patch, lucene-2881.patch, lucene-2881.patch, lucene-2881.patch,
lucene-2881.patch, lucene-2881.patch
> Currently FieldInfo is tracked per IW session to guarantee consistent global field-naming
/ ordering. IW carries FI instances over from previous segments which also carries over field
properties like isIndexed etc. While having consistent field ordering per IW session appears
to be important due to bulk merging stored fields etc. carrying over other properties might
become problematic with Lucene's Codec support.  Codecs that rely on consistent properties
in FI will fail if FI properties are carried over.
> The DocValuesCodec (DocValuesBranch) for instance writes files per segment and field
(using the field id within the file name). Yet, if a segment has no DocValues indexed in a
particular segment but a previous segment in the same IW session had DocValues, FieldInfo#docValues
will be true  since those values are reused from previous segments. 
> We already work around this "limitation" in SegmentInfo with properties like hasVectors
or hasProx which is really something we should manage per Codec & Segment. Ideally FieldInfo
would be managed per Segment and Codec such that its properties are valid per segment. It
also seems to be necessary to bind FieldInfoS to SegmentInfo logically since its really just
per segment metadata.  

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message