lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: How to improve retrieval time when searching for a date range
Date Tue, 09 Apr 2013 14:58:01 GMT
Hi,

Precision step=1 is not necessarily the fastest (see javadocs of Lucene, should be similar
in Lucene.NET). Try the default, 4, first. In general, those range queries will always be
slower than text-only queries, as there is much more work to do (more terms, more documents,...)

This question is more related to Lucene.NET so I would ask the question on their mailing list.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Avi Levy [mailto:levy@wesee.com]
> Sent: Tuesday, April 09, 2013 4:51 PM
> To: java-user@lucene.apache.org
> Subject: How to improve retrieval time when searching for a date range
> 
> Hello,
> 
> I have a Lucene.NET index created with version 2.9.4.1. The size of the index
> is about 25 Million entries (In the production environment I will have
> 50 Million or more). The Index size is 5.75GB. The index is used for searching
> by text. I need to add a new functionality that allows performing a query for
> a specific date range in addition to the textual search (The query is for text
> AND date range). The date range the user can select from is either last 7 days
> or last 30 days.
> 
> The implementation I tried was to add a new indexed only numeric field
> representing a date. The date is indexed as integer in the format yyyyMMdd.
> I am indexing this field with a precision step of 1 (to make the retrieval the
> fastest).  During retrieval I create a Boolean query that has the original query
> and I added a clause for with MUST for the date range.
> 
> When I compare the results to regular textual queries I see much slower
> results. I compared by running 10 queries for warm-up (I don't count the
> results). Then another 90 queries where I count the results.
> 
> I will appreciate suggestion and tips on how to the performance of searching
> by dates can be improved.
> 
> You can see below the statistics for the runs, and the code for creating the
> fields and the query.
> 
> Thanks,
> Avi
> 
> No changes (using index with no dates)
> 08 18:17:01,213 [1]  INFO: {(null)} - Min search time: 2
> 08 18:17:01,213 [1]  INFO: {(null)} - Max search time: 88
> 08 18:17:01,213 [1]  INFO: {(null)} - Average search time: 23.0674157303371
> 08 18:17:01,213 [1]  INFO: {(null)} - Search time Variance : 20.5
> 08 18:17:01,213 [1]  INFO: {(null)} - Number of results above 700ms: 0
> 
> Index With Date (not using dates in query)
> 08 18:22:49,093 [1]  INFO: {(null)} - Min search time: 3
> 08 18:22:49,093 [1]  INFO: {(null)} - Max search time: 176
> 08 18:22:49,093 [1]  INFO: {(null)} - Average search time: 50.9325842696629
> 08 18:22:49,093 [1]  INFO: {(null)} - Search time Variance : 46.85
> 08 18:22:49,093 [1]  INFO: {(null)} - Number of results above 700ms: 0
> 
> With Dates - Last 7 Days
> 08 19:38:17,988 [1]  INFO: {(null)} - Min search time: 33
> 08 19:38:17,988 [1]  INFO: {(null)} - Max search time: 1668
> 08 19:38:17,988 [1]  INFO: {(null)} - Average search time: 704.741573033708
> 08 19:38:17,988 [1]  INFO: {(null)} - Search time Variance : 607.05
> 08 19:38:17,988 [1]  INFO: {(null)} - Number of results above 700ms: 44
> 
> With Dates - Last 30 Days
> 08 19:48:17,123 [1]  INFO: {(null)} - Min search time: 105
> 08 19:48:17,123 [1]  INFO: {(null)} - Max search time: 4808
> 08 19:48:17,123 [1]  INFO: {(null)} - Average search time: 2846.75280898876
> 08 19:48:17,123 [1]  INFO: {(null)} - Search time Variance : 1934.11
> 08 19:48:17,123 [1]  INFO: {(null)} - Number of results above 700ms: 72
> 
> Here are the field's definitions:
> 
> var idField = new Field( "ID", String.Empty, Field.Store.YES,
> Field.Index.NOT_ANALYZED_NO_NORMS ); document.Add( idField ); var
> id2Field = new Field( "ID2", String.Empty, Field.Store.YES, Field.Index.NO );
> document.Add( id2Field );
> 
> var txtField = new Field( "txtField", String.Empty, Field.Store.NO, Field.Index.
> ANALYZED ); document.Add( txtField );
> 
> var txt2Field = new Field( "txt2Field", String.Empty, Field.Store.NO,
> Field.Index. ANALYZED ); document.Add( txt2Field );
> 
> var txt3Field = new Field( "txt3Field", String.Empty, Field.Store.NO,
> Field.Index. ANALYZED ); document.Add( txt3Field );
> 
> 
> 
> // The new date field
> 
> var dateField = new NumericField( "Date", 1, Field.Store.NO, true );
> document.Add(dateField);
> 
> 
> 
> I set the values to the fields. For the new date field I set it like this:
> 
> Int64 dateInt = <some date>;
> 
> dateField.SetIntValue(dateInt);
> 
> 
> 
> The query:
> 
> var fields = new String[3];
> 
> Dictionary<String, Single> boosts = new Dictionary<String, Single>();
> 
> fields[0]="txtField";
> 
> boosts.Add( fields[0],<Value>);
> 
> fields[1]="txt2Field";
> 
> boosts.Add( fields[1],<Value>);
> 
> fields[2]="txt3Field";
> 
> boosts.Add( fields[2],<Value>);
> 
> MultiFieldQueryParser parser = new MultiFieldQueryParser(
> Version.LUCENE_29, fields, analyzer, boosts ); var boolQuery = new
> BooleanQuery(); Query simpleParsedQuery = parser.Parse( queryText );
> boolQuery.Add( simpleParsedQuery, BooleanClause.Occur.MUST );
> DateTime beginDate = <Date 7 or 30 days ago).
> Int32 beginDateInt = beginDate.Day + beginDate.Month * 100 +
> beginDate.Year
> * 10000;
> 
> DateTime now = DateTime.UtcNow;
> 
> Int32 endDateInt = now.Day + now.Month * 100 + now.Year * 10000;
> 
> NumericRangeQuery datesQuery = NumericRangeQuery.NewIntRange(
> "Date", beginDateInt, endDateInt, true, true );
> 
> boolQuery.Add( datesQuery, BooleanClause.Occur.MUST );
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message