lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <rayvittal-li...@yahoo.com>
Subject Re: Avoiding sort by date
Date Fri, 13 Oct 2006 01:25:19 GMT
Thanks, Erik for the pointer to Solr.

Since the document index is added to frequently, creating new IndexSearchers is required anyway.
We plan to 'age' out already created IndexSearcher and create new ones every few minutes.
Solr's cache regeneration would be useful in this scenario.

Does the Sort function create some kind of internal cache? Observing the heap, it seems that
a full garbage collection after calling IndexSearcher.close() still leaves a lot of memory
occupied.

Thanks
--
Solidguy

----- Original Message ----
From: Erik Hatcher <erik@ehatchersolutions.com>
To: java-user@lucene.apache.org
Sent: Thursday, October 12, 2006 12:58:50 PM
Subject: Re: Avoiding sort by date

You really should be using the same IndexSearcher for successive  
searches.  Sorting works best when done with a "warm" searcher.  Have  
a look at Solr's warming strategy, and consider adopting that in some  
way.

    Erik


On Oct 12, 2006, at 3:04 PM, <rayvittal-lists@yahoo.com> wrote:

> Hi folks,
>
> I am using Lucene 2.0
>
> In our application, I am indexing a stream of documents. Each  
> document is fairly small (< 1 KB), but there can be 10's of  
> millions of documents. Each document has a Timestamp field. Users  
> can enter free-form searches and a date/time range. They are most  
> interested in the most recent documents (as indicated in the  
> Timestamp field). An obvious way to do achieve this is to
> searcher = new IndexSearcher(indexDir);
> RangeFilter rf = new RangeFilter("day", start, end, true, true);
> hits = searcher.search(query,rf,new Sort(new SortField[]{
>                     new SortField 
> ("timestamp",SortField.STRING,true )}));
>
> Depending on the query, there may be millions of hits results. If  
> the same query is executed several times in quick succession, the  
> heap quickly runs out of memory. I suspect that this is because  
> Lucene needs to load all the millions of hits in order to sort the  
> results.
>
> My idea is to avoid the Sort() entirely. Is there a way, during  
> indexing (or by setting Weights inside the query) to automatically  
> set the score for more recent documents higher?
>
> Thanks
> --
> Solidguy
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message