lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Avoiding sort by date
Date Fri, 13 Oct 2006 14:56:17 GMT

On Oct 12, 2006, at 9:25 PM, <rayvittal-lists@yahoo.com> wrote:
> Does the Sort function create some kind of internal cache?  
> Observing the heap, it seems that a full garbage collection after  
> calling IndexSearcher.close() still leaves a lot of memory occupied.

Yes, sorting caches, potentially a lot.  I'm not sure what could be  
going on with the excessive memory usage, but certainly the sorting  
caches can be large.



>
> Thanks
> --
> Solidguy
>
> ----- Original Message ----
> From: Erik Hatcher <erik@ehatchersolutions.com>
> To: java-user@lucene.apache.org
> Sent: Thursday, October 12, 2006 12:58:50 PM
> Subject: Re: Avoiding sort by date
>
> You really should be using the same IndexSearcher for successive
> searches.  Sorting works best when done with a "warm" searcher.  Have
> a look at Solr's warming strategy, and consider adopting that in some
> way.
>
>     Erik
>
>
> On Oct 12, 2006, at 3:04 PM, <rayvittal-lists@yahoo.com> wrote:
>
>> Hi folks,
>>
>> I am using Lucene 2.0
>>
>> In our application, I am indexing a stream of documents. Each
>> document is fairly small (< 1 KB), but there can be 10's of
>> millions of documents. Each document has a Timestamp field. Users
>> can enter free-form searches and a date/time range. They are most
>> interested in the most recent documents (as indicated in the
>> Timestamp field). An obvious way to do achieve this is to
>> searcher = new IndexSearcher(indexDir);
>> RangeFilter rf = new RangeFilter("day", start, end, true, true);
>> hits = searcher.search(query,rf,new Sort(new SortField[]{
>>                     new SortField
>> ("timestamp",SortField.STRING,true )}));
>>
>> Depending on the query, there may be millions of hits results. If
>> the same query is executed several times in quick succession, the
>> heap quickly runs out of memory. I suspect that this is because
>> Lucene needs to load all the millions of hits in order to sort the
>> results.
>>
>> My idea is to avoid the Sort() entirely. Is there a way, during
>> indexing (or by setting Weights inside the query) to automatically
>> set the score for more recent documents higher?
>>
>> Thanks
>> --
>> Solidguy
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message