lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 26702] - [PATCH] arbitrary sorting
Date Wed, 18 Feb 2004 23:42:01 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26702>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26702

[PATCH] arbitrary sorting





------- Additional Comments From tjones@hoovers.com  2004-02-18 23:42 -------
Doug et al - a question:

In writing the unit tests I found a problem (imagine that :) with the current
implementation and multisearchers, and I was wondering how you would prefer to
handle it.  

Currently, when a sort is done by string, all the terms are looked up, sorted,
and given a numerical index.  Only the numerical index is stored - the strings
are thrown away.  Then, when a sort is done by the given field, the numerical
values are used to put the hits in order.  This is fast and uses as little
memory as possible.

However, when using a multisearcher and sorting over the same field, the hits
come back from the individual searchers and each one contains its numerical sort
value (not the original strings - remember, they were thrown away).  A problem
occurs if the individual searchers do not have the same terms in the sort field.
 If index A contains only the terms "a", "b", and "c" (which are given integer
values 1, 2, 3) and index B contains only the terms "r", "s", "t" (which are
also given values 1, 2, 3) then when the multisearcher collates them, it comes
out something like "a", "r", "b", "s", "c", "t".

To solve this, we either need to:

- keep all the term values in memory
  (too much memory)

- after getting the list of hits, go back and look up the term values again
  (not very efficient) 

- not allow sorting by strings using a multisearcher
  (not very nice)

Also, however we solve it, it will probably need to apply to simple single
indexes as well as multi/remote indexes, since the API is shared.

Any thoughts or ideas?

I guess I am thinking [2] is the way to go ...

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message