lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Order of fields returned by Document.getFields()
Date Wed, 17 Dec 2008 14:08:24 GMT

On Dec 16, 2008, at 10:00 AM, Patrick Johnstone wrote:

>
> I'm using Lucene via Solr and recently upgraded from an early Summer  
> nightly
> build to the released version of Solr 1.3 (which seems to use  
> something in
> the neighborhood of Lucene 2.3).  I'm posting this here because I  
> believe
> that my issue is with Lucene, not Solr.

Do you know the version of Lucene in that version of Solr (from this  
summer).  If you open up the JAR, it should be in the Manifest.  With  
that info, I can go back and look at that revision of Lucene.  I'm  
guessing that it was at least 2.3 as well, but I'm not sure.

>
>
> After the upgrade, I noticed that the order of fields being returned  
> for
> documents had changed.  Previously, the order of fields being  
> returned was
> the same as the order in which they were added to the document  
> (which is
> what's stated in the FAQ and other places I came across but not  
> specifically
> spelled out in the Javadoc).
> Now, the fields always seem to come back in lexicographic order by  
> field
> name.

I would agree these are contradictory.  I've always understood the  
contract to be such that they would be returned in order of addition.   
Still, seems like it isn't something I would rely on.


>
>
> I believe (but am by no means sure) that this is being caused by the
> following bit of code in DocFieldProcessorPerThread.java:
>
>    // If we are writing vectors then we must visit
>    // fields in sorted order so they are written in
>    // sorted order.  TODO: we actually only need to
>    // sort the subset of fields that have vectors
>    // enabled; we could save [small amount of] CPU
>    // here.
>    quickSort(fields, 0, fieldCount-1);
>
>    for(int i=0;i<fieldCount;i++)
>      fields[i].consumer.processFields(fields[i].fields,
> fields[i].fieldCount);
>
> This code seems to sort the fields of the document into order before
> processing them.  If this is true, then the original field order is  
> lost and
> can't ever be recovered.  (None of the fields in our index use  
> TermVectors.)
>
> My questions are:  Is my reading of this behavior correct?  ...and...

I believe it is, but would be good to have Mike comment, as he wrote  
this code.


>
> if so:  Is there some other way to get the original order back?

Not sure yet.  There seems to be a need for it when writing the term  
vectors.

>
>
> The application that I'm building took some advantage of the fact  
> that the
> fields were returned in the orignial order (becuase the order had some
> meaning) and it may be difficult for me to work around this change.
>

Can you describe your use case a bit more?  Perhaps we can brainstorm  
some alternatives, just to give you options.

-Grant

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message