lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1727) Order of stored Fields not maintained
Date Mon, 06 Jul 2009 10:56:15 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727473#action_12727473
] 

Michael McCandless commented on LUCENE-1727:
--------------------------------------------

bq. If we start guaranteeing that fields get returned in the same order as they were added,
what are the costs?

I'm not yet sure, but I expect it to be a minor added cost; I'll know more as I dig in.

bq. AFAIK, sorting the fields is necessary to group multiple values for the same field, and
it also ensured that segments with the same fields had the same field numbers, which enables
faster segment merging?

Actually the mapping of field name -> number happens before the sort, so presently we rely
on the docs having the same order of fields, to enable bulk merging of stored fields &
term vectors.  Bulk merging is really a rather brittle optimization.  Actually we could improve
it by only checking for matched name -> numbers for fields that are stored or have term
vectors enabled (right now we check that all fields match), and by pre-sorting the field names
when doing the mapping to number.

I plan to just move the stored fields writer up in the indexing chain, so that it receives
the in-order list of fields, not the coalesced & sorted list.

bq. Based on McCandless comments in email, it sounds like order was only ever maintained for
fields that don't use term vectors - in which case the documentation was only ever partially
correct.

Actually, order was correctly maintained prior to 2.3.  In 2.3, it was maintained only if
you had no term vectors fields (ie, we only sorted when there was at least 1 field w/ term
vectors enabled).  In 2.4 we always sort and order was never maintained.  For 2.9 I think
we should fix it again so that order is fully maintained.

bq.  For example, one can think of a simple way to improve the performance of loading only
certain fields

I think that'd be a good improvement to how fields are stored!

> Order of stored Fields not maintained
> -------------------------------------
>
>                 Key: LUCENE-1727
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1727
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.4, 2.4.1
>            Reporter: Hoss Man
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>
> As noted in these threads...
> http://www.nabble.com/Order-of-fields-returned-by-Document.getFields%28%29-to21034652.html
> http://www.nabble.com/Order-of-fields-within-a-Document-in-Lucene-2.4%2B-to24210597.html
> somewhere prior to Lucene 2.4.1 a change was introduced that prevents the Stored fields
of a Document from being returned in same order that they were originally added in.  This
can cause serious performance problems for people attempting to use LoadFirstFieldSelector
or a custom FieldSelector with the LOAD_AND_BREAK, or the SIZE_AND_BREAK options (since the
fields don't come back in the order they expect)
> Speculation in the email threads is that the origin of this bug is code introduced by
LUCENE-1301 -- but the purpose of that issue was refactoring, so if it really is the cause
of the change this would seem to be a bug, and not a side affect of a conscious implementation
change.
> Someone who understands indexing internals should investigate this.  At a minimum, if
it's decided that this is not actual a bug, then prior to resolving this bug the wiki docs
and some of the FIeldSelector javadocs should be updated to make it clear what order Fields
will be returned in.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message