lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 26702] - [PATCH] arbitrary sorting
Date Mon, 09 Feb 2004 20:09:14 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26702>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26702

[PATCH] arbitrary sorting





------- Additional Comments From tjones@hoovers.com  2004-02-09 20:09 -------
So I'm thinking something like this:

in Searchable:
 
   TopFieldDocs search (Query q, Filter f, int n, String[] sort_fields)

where: 

   TopFieldDocs is like TopDocs
   
   class FieldDoc {
      int doc;
      float score;
      String[] field_values;   // correlates to sort_fields[]
   }

although there's some issues to think about:

Since field_values[] could be either integer, float or string we could make it
an Object[] array.  But this might not be as efficient as just using the string
term values we have already.  On the other hand, the data transport layer isn't
so efficient because it is converting to and from string values.  So I'm not
really sure - any opinions?

Secondly, since we have the document number and relevancy in the structure, in
the cases where sort is done by a mixture of field(s) and either relevancy or
document number, the field_values[] array doesn't exactly match the original
sort_fields[] array.  Or, we could go ahead and put those values in the array as
strings (or Object).

Thirdly, it means we rely upon pattern matching to detect whether the field
terms are integers, floats or strings.  This works well enough unless there are
terms that look like integers or floats but really are strings (assuming we look
for integers first, then floats, then assume strings).

For this last problem what I'm wondering is if we might want to extend the Field
class to have IntegerField and FloatField.  This provides error checking on the
data on the way into the index, so when we get it back out we can be reasonably
sure we have clean data (i.e. we don't try to interpret the contents of a field
as int when really it's not).  Kind of like DateField.

Following on to this - is there any place to store and then retrieve what data
type a field was given?  Is there someplace in the index to say a field is only
integers or only floats?  Then we wouldn't have to rely upon pattern matching.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message