lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <rayvittal-li...@yahoo.com>
Subject Avoiding sort by date
Date Thu, 12 Oct 2006 19:04:42 GMT
Hi folks,

I am using Lucene 2.0

In our application, I am indexing a stream of documents. Each document is fairly small (<
1 KB), but there can be 10's of millions of documents. Each document has a Timestamp field.
Users can enter free-form searches and a date/time range. They are most interested in the
most recent documents (as indicated in the Timestamp field). An obvious way to do achieve
this is to 
searcher = new IndexSearcher(indexDir);
RangeFilter rf = new RangeFilter("day", start, end, true, true);
hits = searcher.search(query,rf,new Sort(new SortField[]{
                    new SortField("timestamp",SortField.STRING,true )}));

Depending on the query, there may be millions of hits results. If the same query is executed
several times in quick succession, the heap quickly runs out of memory. I suspect that this
is because Lucene needs to load all the millions of hits in order to sort the results.

My idea is to avoid the Sort() entirely. Is there a way, during indexing (or by setting Weights
inside the query) to automatically set the score for more recent documents higher?

Thanks
--
Solidguy



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message