lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Itamar Syn-Hershko <ita...@code972.com>
Subject Re: How to improve retrieval time when searching for a date range
Date Wed, 10 Apr 2013 14:16:33 GMT
Did you try using a filter as  I suggested? a Range query, and Range query,
is going to be rather expensive as you make its range larger


On Tue, Apr 9, 2013 at 5:58 PM, Avi Levy <levy@wesee.com> wrote:

> Hello,
>
> I have a Lucene.NET index created with version 2.9.4.1. The size of the
> index is about 25 Million entries (In the production environment I will
> have
> 50 Million or more). The Index size is 5.75GB. The index is used for
> searching by text. I need to add a new functionality that allows performing
> a query for a specific date range in addition to the textual search (The
> query is for text AND date range). The date range the user can select from
> is either last 7 days or last 30 days.
>
> The implementation I tried was to add a new indexed only numeric field
> representing a date. The date is indexed as integer in the format yyyyMMdd.
> I am indexing this field with a precision step of 1 (to make the retrieval
> the fastest).  During retrieval I create a Boolean query that has the
> original query and I added a clause for with MUST for the date range.
>
> A few days ago I posted a question and got some useful suggestions. I have
> reached a point where I get acceptable search times when I compare queries
> on the index with the dates to the index without them. However, the problem
> I am facing now is that the queries with the dates are slow. I will
> appreciate suggestion and tips on how to the performance of searching by
> dates can be improved.
>
> You can see below the statistics for the runs, and the code for creating
> the
> fields and the query.
>
> Thanks,
> Avi
>
> No changes (using index with no dates)
> 08 18:17:01,213 [1]  INFO: {(null)} - Min search time: 2
> 08 18:17:01,213 [1]  INFO: {(null)} - Max search time: 88
> 08 18:17:01,213 [1]  INFO: {(null)} - Average search time: 23.0674157303371
> 08 18:17:01,213 [1]  INFO: {(null)} - Search time Variance : 20.5
> 08 18:17:01,213 [1]  INFO: {(null)} - Number of results above 700ms: 0
>
> Index With Date (not using dates in query)
> 08 18:22:49,093 [1]  INFO: {(null)} - Min search time: 3
> 08 18:22:49,093 [1]  INFO: {(null)} - Max search time: 176
> 08 18:22:49,093 [1]  INFO: {(null)} - Average search time: 50.9325842696629
> 08 18:22:49,093 [1]  INFO: {(null)} - Search time Variance : 46.85
> 08 18:22:49,093 [1]  INFO: {(null)} - Number of results above 700ms: 0
>
> With Dates - Last 7 Days
> 08 19:38:17,988 [1]  INFO: {(null)} - Min search time: 33
> 08 19:38:17,988 [1]  INFO: {(null)} - Max search time: 1668
> 08 19:38:17,988 [1]  INFO: {(null)} - Average search time: 704.741573033708
> 08 19:38:17,988 [1]  INFO: {(null)} - Search time Variance : 607.05
> 08 19:38:17,988 [1]  INFO: {(null)} - Number of results above 700ms: 44
>
> With Dates - Last 30 Days
> 08 19:48:17,123 [1]  INFO: {(null)} - Min search time: 105
> 08 19:48:17,123 [1]  INFO: {(null)} - Max search time: 4808
> 08 19:48:17,123 [1]  INFO: {(null)} - Average search time: 2846.75280898876
> 08 19:48:17,123 [1]  INFO: {(null)} - Search time Variance : 1934.11
> 08 19:48:17,123 [1]  INFO: {(null)} - Number of results above 700ms: 72
>
> Here are the field's definitions:
>
> var idField = new Field( "ID", String.Empty, Field.Store.YES,
> Field.Index.NOT_ANALYZED_NO_NORMS );
> document.Add( idField );
> var id2Field = new Field( "ID2", String.Empty, Field.Store.YES,
> Field.Index.NO );
> document.Add( id2Field );
>
> var txtField = new Field( "txtField", String.Empty, Field.Store.NO,
> Field.Index. ANALYZED ); document.Add( txtField );
>
> var txt2Field = new Field( "txt2Field", String.Empty, Field.Store.NO,
> Field.Index. ANALYZED ); document.Add( txt2Field );
>
> var txt3Field = new Field( "txt3Field", String.Empty, Field.Store.NO,
> Field.Index. ANALYZED ); document.Add( txt3Field );
>
>
>
> // The new date field
>
> var dateField = new NumericField( "Date", 1, Field.Store.NO, true );
> document.Add(dateField);
>
>
>
> I set the values to the fields. For the new date field I set it like this:
>
> Int64 dateInt = <some date>;
>
> dateField.SetIntValue(dateInt);
>
>
>
> The query:
>
> var fields = new String[3];
>
> Dictionary<String, Single> boosts = new Dictionary<String, Single>();
>
> fields[0]="txtField";
>
> boosts.Add( fields[0],<Value>);
>
> fields[1]="txt2Field";
>
> boosts.Add( fields[1],<Value>);
>
> fields[2]="txt3Field";
>
> boosts.Add( fields[2],<Value>);
>
> MultiFieldQueryParser parser = new MultiFieldQueryParser(
> Version.LUCENE_29,
> fields, analyzer, boosts );
> var boolQuery = new BooleanQuery();
> Query simpleParsedQuery = parser.Parse( queryText );
> boolQuery.Add( simpleParsedQuery, BooleanClause.Occur.MUST );
> DateTime beginDate = <Date 7 or 30 days ago).
> Int32 beginDateInt = beginDate.Day + beginDate.Month * 100 + beginDate.Year
> * 10000;
>
> DateTime now = DateTime.UtcNow;
>
> Int32 endDateInt = now.Day + now.Month * 100 + now.Year * 10000;
>
> NumericRangeQuery datesQuery = NumericRangeQuery.NewIntRange( "Date",
> beginDateInt, endDateInt, true, true );
>
> boolQuery.Add( datesQuery, BooleanClause.Occur.MUST );
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message