lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Johnstone" <>
Subject RE: Order of fields returned by Document.getFields()
Date Wed, 17 Dec 2008 14:59:52 GMT

> >
> > I'm using Lucene via Solr and recently upgraded from an 
> early Summer 
> > nightly build to the released version of Solr 1.3 (which 
> seems to use 
> > something in the neighborhood of Lucene 2.3).  I'm posting 
> this here 
> > because I believe that my issue is with Lucene, not Solr.
> Do you know the version of Lucene in that version of Solr 
> (from this summer).  If you open up the JAR, it should be in 
> the Manifest.  With that info, I can go back and look at that 
> revision of Lucene.  I'm guessing that it was at least 2.3 as 
> well, but I'm not sure.

All of the lucene jar files from the Solr nightly are 2.3.1.

> >
> >
> > After the upgrade, I noticed that the order of fields being 
> returned 
> > for documents had changed.  Previously, the order of fields being 
> > returned was the same as the order in which they were added to the 
> > document (which is what's stated in the FAQ and other places I came 
> > across but not specifically spelled out in the Javadoc).
> > Now, the fields always seem to come back in lexicographic order by 
> > field name.
> I would agree these are contradictory.  I've always understood the  
> contract to be such that they would be returned in order of 
> addition.   
> Still, seems like it isn't something I would rely on.

Given no specific statement of contract in the Javadoc, I would
agree that the returned order of fields would be undefined.  But
that didn't stop me from being hopeful that the original order
would be maintained. :-)

> >
> >
> > The application that I'm building took some advantage of the fact  
> > that the
> > fields were returned in the orignial order (becuase the 
> order had some
> > meaning) and it may be difficult for me to work around this change.
> >
> Can you describe your use case a bit more?  Perhaps we can 
> brainstorm  
> some alternatives, just to give you options.

The documents in the index are largely free-form -- that is, I have
no a priori knowledge of what fields are in a given document.  There
are some well-defined fields that have a known type, particularly the
default field which is where most (but not all) of the searching is
done.  The best way to think of it may be that a document has chunk of
searchable text in the default field and the other fields represent
various metadata (potentially searchable) that the document creator
has defined.  In some cases, the order of the metatdata fields is
significant (e.g. more "interesting" fields are added earlier).

My first thought as an alternative would be to create a new field
(stored but not indexed) that just contained the names of the fields
in their original order.  I could probably live with the extra storage
requirements this would cause but there would be a fair amount of
repetitive data.  (The documents tend to fall into types which are
somewhat similar as a group, thus their field orders would be similar.)

Having the field order implicitly defined by the indexing process
would be more efficient but I don't have a grasp of the Lucene internals
so I don't know what forces drove the code change.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message