lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Sorting memory-efficiently by any numeric field (dates too?)
Date Tue, 12 Nov 2013 17:00:31 GMT
Before I go and pat myself on the back, what do people think about this
trick? The base problem is "Is there a space-efficient way to return the
top N documents, sorted by a numeric field". The numeric field includes
dates.

It come to me in a vision in a flash! (The Pickle Song, Arlo Guthrie). If
we could return the numeric field in question as the score of a document it
should work without allocating the internal arrays for holding all the
timestamps.

So what about something like this?
/select?q={!boost b=manufacturedate_dt}text:*
and reverse order by
/select?q={!boost b=div(1,manufacturedate_dt)}text:*

It works on the test data. So let's assume that we're space constrained. It
_seems_ like this would only allocate enough space for the top N documents
in the result set which is insignificant in terms of memory consumption for
a large number of documents in a core. Any obvious problems that people see?

I see a couple of shortcomings:

1>  You only get one field. Unless you can create a really clever function
that incorporates all the values in multiple fields, this is going to be
hard to use with more than one field.

2> The boost syntax doesn't allow for a *:*, so you have to specify an
existing field. If there happen to be documents that don't have anything in
the field, you'll miss them.

3> I'm not sure what the performance issues are, especially in the case
where _every_ document scores better than the current top-N

Erick

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message