lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Sort Performance Question
Date Tue, 20 Mar 2007 20:36:02 GMT
In a web application, I have generally cached IndexSearcher in  
application scope and reused it for all requests.

You will have to balance the demand for timeliness of updates with  
the time it takes to build up the sort caches.  You can't really have  
instantaneous viewing of newly added documents and fully optimized  
sorting (or any other operation that relies on building up caches  
from an IndexReader/IndexSearcher).  Many folks have implemented  
IndexSearcher warming in the background of their applications,  
something which is a dramatic feature in Solr.  So you may want to  
have a look at how Solr does its magic, or simply use Solr flat out :)

	Erik


On Mar 20, 2007, at 4:31 PM, David Seltzer wrote:

> Erik,
>
> I'm not using a cached IndexSearcher. Is this an option in an
> environment where the underlying index changes on a second-by-second
> basis? At what layer would a cached IndexSearcher be cached? At the
> tomcat layer?
>
> Caching at the object layer seems like it might help, but it doesn't
> address my underlying concern. IE: the relative performance difference
> between natural order and sorting order. Maybe you're right - and I
> shouldn't be worried about the very first search against the index.
>
> How would a cached searcher implementation look?
>
> -Dave
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Tuesday, March 20, 2007 4:03 PM
> To: java-user@lucene.apache.org
> Subject: Re: Sort Performance Question
>
> Are you using a cached IndexSearcher such that successive sorts on
> the same field will be more efficient?
>
> 	Erik
>
>
> On Mar 20, 2007, at 3:39 PM, David Seltzer wrote:
>
>> Hi All,
>>
>>
>>
>> I have a sort performance question:
>>
>>
>>
>> I have a fairly large index consisting of chunks of full-text
>> transcriptions of television, radio and other media, and I'm  
>> trying to
>> make it searchable and sortable by date.  The search front-end uses a
>> parallelmultisearcher to search up to three indexes at a time (each
>> index contains a month of live data). When I search for the word
>> "toast"
>> (for example) sorted by score the results come back in about 1200ms,
>> when I sort it by DateTime the results come back in 3900ms.
>>
>>
>>
>> Initially I was sorting based on a unixtime field, but having read
>> up on
>> it, I switched to a slightly easier format: "yyyyMMDDHHmm". Now this
>> value is still larger than an int, so I went one step farther and
>> created two more fields for test purposes: SortDate, which is  
>> yyyyMMdd
>> and SortTime which is HHmm. When I sort by SortDate then SortTime the
>> results come in even slower, around 4300ms.
>>
>>
>>
>> To summarize:
>>
>>
>>
>> //The sorting fields looks like this:
>>
>> new Field("SortDateTime", sdfDateTime.format(dMySortDateTime),
>> Field.Store.YES, Field.Index.UN_TOKENIZED);
>>
>> new Field("SortDate", sdfDate.format(dMySortDateTime),
>> Field.Store.YES,
>> Field.Index.UN_TOKENIZED);
>>
>> new Field("SortTime", sdfTime.format(dMySortDateTime),
>> Field.Store.YES,
>> Field.Index.UN_TOKENIZED);
>>
>>
>>
>> //and the performance looks like this:
>>
>>
>>
>> //sort by score
>>
>> Sort sSortOrder = Sort.RELEVANCE; //1200ms
>>
>>
>>
>> //sort by datetime
>>
>> Sort sSortOrder = new Sort("SortDateTime", true); //3900ms
>>
>>
>>
>> //sort by date then time
>>
>> //yes, I know this isn't valid code
>>
>> Sort sSortOrder = new Sort({new
>> SortField("SortDate",SortField.INT,bReverse), new
>> SortField("SortTime",SortField.INT,bReverse)}); //4300ms
>>
>>
>>
>>
>>
>> The two indexes that are being searched at the moment look like this:
>>
>>
>>
>> Index 1:
>>
>> Index Path: /storage/unisearch/MMS_index/2007.02/
>>
>> Index Size on Disk: 1,400,569 KB
>>
>> Number of Records: 2682238
>>
>> Index Version: 03/13/2007
>>
>>
>>
>> Index 2:
>>
>> Index Path: /storage/unisearch/MMS_index/2007.03/
>>
>> Index Size on Disk: 2,055,199 KB
>>
>> Number of Records: 3457434
>>
>> Index Version: 03/13/2007
>>
>>
>>
>> The search is being performed in tomcat and I'm running:
>> org.apache.lucene - build 2007-02-14 on a Dual 3.4GHz Xeon w/ 2GB
>> memory
>> and Red Hat 3.4.3-22.
>>
>>
>>
>> So, onto the question: Is this fast, slow, or normal.
>>
>>
>>
>> Along, with the obvious follow up: if it's slow, how can I make it
>> faster.
>>
>>
>>
>> Thanks for your help!
>>
>>
>>
>> -Dave
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message