From Dmitry Serebrennikov <>
Subject Sorting vs. document boosting
Date Fri, 19 Oct 2001 18:45:35 GMT
With the addition of term vectors, I think there are new ways that we 
can now do sorting in Lucene. It's different from boosting documents, 
but sometimes it is specifically sorting that is required. I've written 
HitCollectors that retrieve term vectors on each call to collect() and 
the performance remains quite acceptable (that's even without further 
optimizations to the term vectors). So it is now possible to write 
HitCollectors that retrieve termvectors for a specific field and sort 
documents they collect by values in those fields. If the field is a 
"keyword" it will only have a single term, which is probably most 
appropriate for sorting. The term numbers used by the term vectors are 
designed to be ordered by lexicographical ordering of the terms 
themselves. In other words, if number of term A is less than number of 
term B, then term A is also less than the term B. If the term text is 
choosen with thought, this will allow very efficient sorting, I think.

Perhaps sorting should be a property that is set on a query? It could be 
as simple as query.sortBy(String field, boolean ascending). The queyr 
will then sort by the specified field's first term. The field must be 
vectorized. This way it could influence the behavior of the default 
HitCollector (which now sorts only by score).

