lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avi Levy" <l...@wesee.com>
Subject How to improve retrieval time when searching for a date range
Date Tue, 09 Apr 2013 14:51:01 GMT
Hello,

I have a Lucene.NET index created with version 2.9.4.1. The size of the
index is about 25 Million entries (In the production environment I will have
50 Million or more). The Index size is 5.75GB. The index is used for
searching by text. I need to add a new functionality that allows performing
a query for a specific date range in addition to the textual search (The
query is for text AND date range). The date range the user can select from
is either last 7 days or last 30 days.

The implementation I tried was to add a new indexed only numeric field
representing a date. The date is indexed as integer in the format yyyyMMdd.
I am indexing this field with a precision step of 1 (to make the retrieval
the fastest).  During retrieval I create a Boolean query that has the
original query and I added a clause for with MUST for the date range.

When I compare the results to regular textual queries I see much slower
results. I compared by running 10 queries for warm-up (I don't count the
results). Then another 90 queries where I count the results.

I will appreciate suggestion and tips on how to the performance of searching
by dates can be improved.

You can see below the statistics for the runs, and the code for creating the
fields and the query.

Thanks,
Avi

No changes (using index with no dates)
08 18:17:01,213 [1]  INFO: {(null)} - Min search time: 2
08 18:17:01,213 [1]  INFO: {(null)} - Max search time: 88
08 18:17:01,213 [1]  INFO: {(null)} - Average search time: 23.0674157303371
08 18:17:01,213 [1]  INFO: {(null)} - Search time Variance : 20.5
08 18:17:01,213 [1]  INFO: {(null)} - Number of results above 700ms: 0

Index With Date (not using dates in query)
08 18:22:49,093 [1]  INFO: {(null)} - Min search time: 3
08 18:22:49,093 [1]  INFO: {(null)} - Max search time: 176
08 18:22:49,093 [1]  INFO: {(null)} - Average search time: 50.9325842696629
08 18:22:49,093 [1]  INFO: {(null)} - Search time Variance : 46.85
08 18:22:49,093 [1]  INFO: {(null)} - Number of results above 700ms: 0

With Dates - Last 7 Days
08 19:38:17,988 [1]  INFO: {(null)} - Min search time: 33
08 19:38:17,988 [1]  INFO: {(null)} - Max search time: 1668
08 19:38:17,988 [1]  INFO: {(null)} - Average search time: 704.741573033708
08 19:38:17,988 [1]  INFO: {(null)} - Search time Variance : 607.05
08 19:38:17,988 [1]  INFO: {(null)} - Number of results above 700ms: 44

With Dates - Last 30 Days
08 19:48:17,123 [1]  INFO: {(null)} - Min search time: 105
08 19:48:17,123 [1]  INFO: {(null)} - Max search time: 4808
08 19:48:17,123 [1]  INFO: {(null)} - Average search time: 2846.75280898876
08 19:48:17,123 [1]  INFO: {(null)} - Search time Variance : 1934.11
08 19:48:17,123 [1]  INFO: {(null)} - Number of results above 700ms: 72

Here are the field's definitions:

var idField = new Field( "ID", String.Empty, Field.Store.YES,
Field.Index.NOT_ANALYZED_NO_NORMS ); 
document.Add( idField );
var id2Field = new Field( "ID2", String.Empty, Field.Store.YES,
Field.Index.NO );
document.Add( id2Field );

var txtField = new Field( "txtField", String.Empty, Field.Store.NO,
Field.Index. ANALYZED ); document.Add( txtField );

var txt2Field = new Field( "txt2Field", String.Empty, Field.Store.NO,
Field.Index. ANALYZED ); document.Add( txt2Field );

var txt3Field = new Field( "txt3Field", String.Empty, Field.Store.NO,
Field.Index. ANALYZED ); document.Add( txt3Field );

 

// The new date field

var dateField = new NumericField( "Date", 1, Field.Store.NO, true ); 
document.Add(dateField);

 

I set the values to the fields. For the new date field I set it like this:

Int64 dateInt = <some date>;

dateField.SetIntValue(dateInt);

 

The query:

var fields = new String[3];

Dictionary<String, Single> boosts = new Dictionary<String, Single>();

fields[0]="txtField";

boosts.Add( fields[0],<Value>);

fields[1]="txt2Field";

boosts.Add( fields[1],<Value>);

fields[2]="txt3Field";

boosts.Add( fields[2],<Value>);

MultiFieldQueryParser parser = new MultiFieldQueryParser( Version.LUCENE_29,
fields, analyzer, boosts );
var boolQuery = new BooleanQuery(); 
Query simpleParsedQuery = parser.Parse( queryText );
boolQuery.Add( simpleParsedQuery, BooleanClause.Occur.MUST );
DateTime beginDate = <Date 7 or 30 days ago).
Int32 beginDateInt = beginDate.Day + beginDate.Month * 100 + beginDate.Year
* 10000;

DateTime now = DateTime.UtcNow;

Int32 endDateInt = now.Day + now.Month * 100 + now.Year * 10000;

NumericRangeQuery datesQuery = NumericRangeQuery.NewIntRange( "Date",
beginDateInt, endDateInt, true, true );

boolQuery.Add( datesQuery, BooleanClause.Occur.MUST );

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message