lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Moray McConnachie" <mmcco...@oxford-analytica.com>
Subject RE: How to improve retrieval time when searching for a date range
Date Tue, 09 Apr 2013 15:16:47 GMT
Have you experimented with using strings with no analyser instead of
numerics? Apologies if this was in your original post.

This is what we do (in an older version of Lucene), though I haven't run
comparatives on it and I have no idea if it's best practice. But date
strings in yyyyMMdd behave just fine as they can be used for order and
range queries.

You sacrifice some index size.

Yours,
Moray


-----Original Message-----
From: Avi Levy [mailto:levy@wesee.com] 
Sent: 09 April 2013 15:58
To: user@lucenenet.apache.org
Subject: How to improve retrieval time when searching for a date range

Hello,

I have a Lucene.NET index created with version 2.9.4.1. The size of the
index is about 25 Million entries (In the production environment I will
have
50 Million or more). The Index size is 5.75GB. The index is used for
searching by text. I need to add a new functionality that allows
performing a query for a specific date range in addition to the textual
search (The query is for text AND date range). The date range the user
can select from is either last 7 days or last 30 days.

The implementation I tried was to add a new indexed only numeric field
representing a date. The date is indexed as integer in the format
yyyyMMdd.
I am indexing this field with a precision step of 1 (to make the
retrieval the fastest).  During retrieval I create a Boolean query that
has the original query and I added a clause for with MUST for the date
range.

A few days ago I posted a question and got some useful suggestions. I
have reached a point where I get acceptable search times when I compare
queries on the index with the dates to the index without them. However,
the problem I am facing now is that the queries with the dates are slow.
I will appreciate suggestion and tips on how to the performance of
searching by dates can be improved.

You can see below the statistics for the runs, and the code for creating
the fields and the query.

Thanks,
Avi

No changes (using index with no dates)
08 18:17:01,213 [1]  INFO: {(null)} - Min search time: 2
08 18:17:01,213 [1]  INFO: {(null)} - Max search time: 88
08 18:17:01,213 [1]  INFO: {(null)} - Average search time:
23.0674157303371
08 18:17:01,213 [1]  INFO: {(null)} - Search time Variance : 20.5
08 18:17:01,213 [1]  INFO: {(null)} - Number of results above 700ms: 0

Index With Date (not using dates in query)
08 18:22:49,093 [1]  INFO: {(null)} - Min search time: 3
08 18:22:49,093 [1]  INFO: {(null)} - Max search time: 176
08 18:22:49,093 [1]  INFO: {(null)} - Average search time:
50.9325842696629
08 18:22:49,093 [1]  INFO: {(null)} - Search time Variance : 46.85
08 18:22:49,093 [1]  INFO: {(null)} - Number of results above 700ms: 0

With Dates - Last 7 Days
08 19:38:17,988 [1]  INFO: {(null)} - Min search time: 33
08 19:38:17,988 [1]  INFO: {(null)} - Max search time: 1668
08 19:38:17,988 [1]  INFO: {(null)} - Average search time:
704.741573033708
08 19:38:17,988 [1]  INFO: {(null)} - Search time Variance : 607.05
08 19:38:17,988 [1]  INFO: {(null)} - Number of results above 700ms: 44

With Dates - Last 30 Days
08 19:48:17,123 [1]  INFO: {(null)} - Min search time: 105
08 19:48:17,123 [1]  INFO: {(null)} - Max search time: 4808
08 19:48:17,123 [1]  INFO: {(null)} - Average search time:
2846.75280898876
08 19:48:17,123 [1]  INFO: {(null)} - Search time Variance : 1934.11
08 19:48:17,123 [1]  INFO: {(null)} - Number of results above 700ms: 72

Here are the field's definitions:

var idField = new Field( "ID", String.Empty, Field.Store.YES,
Field.Index.NOT_ANALYZED_NO_NORMS ); document.Add( idField ); var
id2Field = new Field( "ID2", String.Empty, Field.Store.YES,
Field.Index.NO ); document.Add( id2Field );

var txtField = new Field( "txtField", String.Empty, Field.Store.NO,
Field.Index. ANALYZED ); document.Add( txtField );

var txt2Field = new Field( "txt2Field", String.Empty, Field.Store.NO,
Field.Index. ANALYZED ); document.Add( txt2Field );

var txt3Field = new Field( "txt3Field", String.Empty, Field.Store.NO,
Field.Index. ANALYZED ); document.Add( txt3Field );

 

// The new date field

var dateField = new NumericField( "Date", 1, Field.Store.NO, true );
document.Add(dateField);

 

I set the values to the fields. For the new date field I set it like
this:

Int64 dateInt = <some date>;

dateField.SetIntValue(dateInt);

 

The query:

var fields = new String[3];

Dictionary<String, Single> boosts = new Dictionary<String, Single>();

fields[0]="txtField";

boosts.Add( fields[0],<Value>);

fields[1]="txt2Field";

boosts.Add( fields[1],<Value>);

fields[2]="txt3Field";

boosts.Add( fields[2],<Value>);

MultiFieldQueryParser parser = new MultiFieldQueryParser(
Version.LUCENE_29, fields, analyzer, boosts ); var boolQuery = new
BooleanQuery(); Query simpleParsedQuery = parser.Parse( queryText );
boolQuery.Add( simpleParsedQuery, BooleanClause.Occur.MUST ); DateTime
beginDate = <Date 7 or 30 days ago).
Int32 beginDateInt = beginDate.Day + beginDate.Month * 100 +
beginDate.Year
* 10000;

DateTime now = DateTime.UtcNow;

Int32 endDateInt = now.Day + now.Month * 100 + now.Year * 10000;

NumericRangeQuery datesQuery = NumericRangeQuery.NewIntRange( "Date",
beginDateInt, endDateInt, true, true );

boolQuery.Add( datesQuery, BooleanClause.Occur.MUST );

 >

---------------------------------------------------------
Disclaimer 

This message and any attachments are confidential and/or privileged. If this has been sent
to you in error, please do not use, retain or disclose them, and contact the sender as soon
as possible.

Oxford Analytica Ltd
Registered in England: No. 1196703
5 Alfred Street, Oxford
United Kingdom, OX1 4EH
---------------------------------------------------------


Mime
View raw message