lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avi Levy" <l...@wesee.com>
Subject RE: Retrieval Performance degradation when indexing a numeric field
Date Thu, 04 Apr 2013 13:44:54 GMT
Thanks for the replies. I will try the suggestion for field precision, and
splitting the dates.

The table was not formatted properly.  Here is the data again:
No dates indexed & Not optimized - 79.76 ms
No dates indexed & Optimized - 75.26 ms
Dates indexed & Not optimized - 312.63 ms
Dates indexed & Optimized - 78.40 ms

Here are the fields definition:
var idField = new Field( "ID", String.Empty, Field.Store.YES,
Field.Index.NOT_ANALYZED_NO_NORMS );
document.Add( idField );

var id2Field = new Field( "ID2", String.Empty, Field.Store.YES,
Field.Index.NO );
document.Add( id2Field );

var txtField = new Field( "txtField", String.Empty, Field.Store.NO,
Field.Index. ANALYZED );
document.Add( txtField );

var txt2Field = new Field( "txt2Field", String.Empty, Field.Store.NO,
Field.Index. ANALYZED );
document.Add( txtField );

var txt3Field = new Field( "txt3Field", String.Empty, Field.Store.NO,
Field.Index. ANALYZED );
document.Add( txtField );

var dateField = new NumericField( "Date", Field.Store.NO, true );
document.Add(dateField);

I set the values to the fields. For the new date field I set it like this:
Int64 dateInt = <some date>;
dateField.SetLongValue(dateInt);

The query:
var fields = new String[3];
Dictionary<String, Single> boosts = new Dictionary<String, Single>();

fields[0]="txtField";
boosts.Add( fields[0],<Value>);
fields[1]="txt2Field";
boosts.Add( fields[1],<Value>);
fields[2]="txt3Field";
boosts.Add( fields[2],<Value>);
MultiFieldQueryParser parser = new MultiFieldQueryParser( Version.LUCENE_29,
fields, analyzer, boosts );
var boolQuery = new BooleanQuery();
Query simpleParsedQuery = parser.Parse( queryText );
boolQuery.Add( simpleParsedQuery, BooleanClause.Occur.MUST );

Notice that I don't search by the date field.
Also, the Boolean query is more complex, but I did not include the other
parts in it for simplicity.

BTW, when I add another clause to the Boolean query with the date, I get
very bad results at around 300ms.
NumericRangeQuery datesQuery = NumericRangeQuery.NewLongRange(  "Date",
<Date>, Int64.MaxValue, true, true );
boolQuery.Add( datesQuery, BooleanClause.Occur.MUST );


-----Original Message-----
From: itamar.synhershko@gmail.com [mailto:itamar.synhershko@gmail.com] On
Behalf Of Itamar Syn-Hershko
Sent: Thursday, April 04, 2013 10:07 AM
To: user@lucenenet.apache.org; Maxim Terletsky
Subject: Re: Retrieval Performance degradation when indexing a numeric field

Not sure that I'm following. Can you show an example of a Document and a
Query?


On Thu, Apr 4, 2013 at 8:19 AM, Maxim Terletsky <sxamt@yahoo.com> wrote:

> Hi,
> We deliberately left the queries the same for both indexes, the one 
> with date field indexed and the one without. On both indexes the query 
> didn't include the date field, only some string field.
>
>
>
> Maxim
>
>
> ________________________________
>  From: Itamar Syn-Hershko <itamar@code972.com>
> To: user@lucenenet.apache.org
> Sent: Wednesday, April 3, 2013 8:57 PM
> Subject: Re: Retrieval Performance degradation when indexing a numeric 
> field
>
> What type of queries? This could happen, yes
>
> Try playing with field precision, and moving to filters where possible 
> On Apr 3, 2013 8:33 PM, "Avi Levy" <levy@wesee.com> wrote:
>
> > Hello,
> >
> > I have a Lucene.NET index created with version 2.9.4.1. I have
> re-indexed
> > the index from scratch and added a numeric field to the index
> representing
> > a
> > date. The field is not stored. The numeric value represents a date 
> > in the format of yyyyMMddhhmm.
> >
> > I noticed that when I use queries on the index they take 
> > significantly longer.
> > Below you can see a table for the average query time.
> > Each run was of 3000 different queries, all running one after the 
> > other with a short sleep between them.
> >
> >
> >
> >
> > No dates indexed
> >
> > Dates indexed
> >
> >
> > Not optimized
> >
> > 79.76 ms
> >
> > 312.63
> >
> >
> > Optimized
> >
> > 75.26 ms
> >
> > 78.40
> >
> > I optimized by setting these parameters: UseCompaundFile = false, 
> > RamBufferSize = 200, TermIndexInterval = 16.
> >
> > Is this an expected behavior?
> > Is there something I can do to improve the performance of 
> > non-optimized index?
> >
> >
> >
> > Thanks,
> > Avi
> >
> >
> >
> >
> >
> >
> >
> >
>


Mime
View raw message