lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Order of fields within a Document in Lucene 2.4+
Date Wed, 01 Jul 2009 01:20:50 GMT
Yeah, I've heard rumblings about this issue before. I can't remember what
patch changed it though - one of Mike M's I think?

On Tue, Jun 30, 2009 at 8:40 PM, Chris Hostetter
<hossman_lucene@fucit.org>wrote:

>
> Hmmm... i'm not an expert on the internals of indexing, and i don't use
> FieldSelectors much, but this seems like a pretty big bug to me ... or at
> the very least: a change in behavior that completely eliminates the value
> of LOAD_AND_BREAK.
>
> https://issues.apache.org/jira/browse/LUCENE-1727
>
>
>
> : The Lucene FAQ says...
> :
> : What is the order of fields returned by Document.fields()?
> : * Fields are returned in the same order they were added to the document.
> : (now getFields() as fields is deprecated)
> :
> : However I think this may no longer be the case in 2.4
> :
> : We are indexing documents in a specific order so that we can
> LOAD_AND_BREAK out of our FieldSelector as early as possible.
> : i.e. we have typically 50 indexed fields for a document, but when we are
> loading results with .doc(), we know we only need 4 of them.
> :
> : So, our code ensures that these are added to the index first - and once
> the 4th field is loaded we break out of the selector.
> :
> : This speeds us up by an order of magnitude.
> :
> :
> :
> : However, we are finding that our field selector is processing fields in
> alphabetical order, not order of addition.  This means that we'd have to
> rename our fields to 'aaa..' in order to guarantee they'd be processed
> first.
> :
> :
> : I think, but am not sure, that this bit of code causes the problem (as
> spotted in
> http://www.mail-archive.com/java-user@lucene.apache.org/msg24105.html).
> : It seems to have been introduced in version 2.4 (fields are in addition
> order in 2.3.2)
> :
> : DocFieldProcessorPerThread.java:
> :
> :    // If we are writing vectors then we must visit
> :    // fields in sorted order so they are written in
> :    // sorted order.  TODO: we actually only need to
> :    // sort the subset of fields that have vectors
> :    // enabled; we could save [small amount of] CPU
> :    // here.
> :    quickSort(fields, 0, fieldCount-1);
> :
> :
> : This appears to sort fields into alphabetical order.
> :
> : Assuming that implementing the TODO would keep them in order of addition
> (and just keep vectors fields themselves sorted) - is it worth raising a
> JIRA to fix this ?
> :
> :
> : regards,
> :
> : matt
> :
> :
> :
> :
> : _________________________________________________________________
> : Get the best of MSN on your mobile
> : http://clk.atdmt.com/UKM/go/147991039/direct/01/
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message