lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2985) Build SegmentCodecs incrementally for consistent codecIDs during indexing
Date Thu, 24 Mar 2011 17:47:05 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010798#comment-13010798
] 

Simon Willnauer commented on LUCENE-2985:
-----------------------------------------

bq. I wonder if we should pass the segmentCodecsBuilder to FieldInfos? This way, FieldInfos.add/update
could set the codecID, instead of caller doing it after the fact (in DocFieldProcessorPerThread)?

here is the thing, I first added it to FieldInfos since it appears to be the place for that
kind of stuff. Yet, the first problem is that DocFieldProcessorPerThread is caching the FI
for each DFPPerField so I would really need to add it to each FieldInfo (FI not FIs). Further
having another invariant in FIs that only applies if we are writing is something I tried to
prevent in the first place and eventually SegementCodecs is somewhat internal to the SegmentInfo
and not to the FieldInfos and I tried to couple them only by the codec ID though. I agree
this would be easier and less disturbing in the code. I'd love to find a better way to do
that really.... except of this part in DocFieldProcessorPerThread is smooth though :/

> Build SegmentCodecs incrementally for consistent codecIDs during indexing
> -------------------------------------------------------------------------
>
>                 Key: LUCENE-2985
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2985
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Codecs, Index
>    Affects Versions: CSF branch, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: CSF branch, 4.0
>
>         Attachments: LUCENE-2985.patch
>
>
> currently we build the SegementCodecs during flush which is fine as long as no codec
needs to know which fields it should handle. This will change with DocValues or when we expose
StoredFields / TermVectors via Codec (see LUCENE-2621 or LUCENE-2935). The other downside
it that we don't have a consistent view of which codec belongs to which field during indexing
and all FieldInfo instances are unassigned (set to -1). Instead we should build the SegmentCodecs
incrementally as fields come in so no matter when a codec needs to be selected to process
a document / field we have the right codec ID assigned.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message