lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Seltzer" <>
Subject Sort Performance Question
Date Tue, 20 Mar 2007 19:39:50 GMT
Hi All,


I have a sort performance question:


I have a fairly large index consisting of chunks of full-text
transcriptions of television, radio and other media, and I'm trying to
make it searchable and sortable by date.  The search front-end uses a
parallelmultisearcher to search up to three indexes at a time (each
index contains a month of live data). When I search for the word "toast"
(for example) sorted by score the results come back in about 1200ms,
when I sort it by DateTime the results come back in 3900ms.


Initially I was sorting based on a unixtime field, but having read up on
it, I switched to a slightly easier format: "yyyyMMDDHHmm". Now this
value is still larger than an int, so I went one step farther and
created two more fields for test purposes: SortDate, which is yyyyMMdd
and SortTime which is HHmm. When I sort by SortDate then SortTime the
results come in even slower, around 4300ms.


To summarize:


//The sorting fields looks like this:

new Field("SortDateTime", sdfDateTime.format(dMySortDateTime),
Field.Store.YES, Field.Index.UN_TOKENIZED);

new Field("SortDate", sdfDate.format(dMySortDateTime),  Field.Store.YES,

new Field("SortTime", sdfTime.format(dMySortDateTime),  Field.Store.YES,


//and the performance looks like this:


//sort by score

Sort sSortOrder = Sort.RELEVANCE; //1200ms


//sort by datetime

Sort sSortOrder = new Sort("SortDateTime", true); //3900ms


//sort by date then time

//yes, I know this isn't valid code

Sort sSortOrder = new Sort({new
SortField("SortDate",SortField.INT,bReverse), new
SortField("SortTime",SortField.INT,bReverse)}); //4300ms



The two indexes that are being searched at the moment look like this:


Index 1:

Index Path: /storage/unisearch/MMS_index/2007.02/

Index Size on Disk: 1,400,569 KB

Number of Records: 2682238

Index Version: 03/13/2007


Index 2:

Index Path: /storage/unisearch/MMS_index/2007.03/

Index Size on Disk: 2,055,199 KB

Number of Records: 3457434

Index Version: 03/13/2007


The search is being performed in tomcat and I'm running:
org.apache.lucene - build 2007-02-14 on a Dual 3.4GHz Xeon w/ 2GB memory
and Red Hat 3.4.3-22.


So, onto the question: Is this fast, slow, or normal.


Along, with the obvious follow up: if it's slow, how can I make it


Thanks for your help!



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message