lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: NumericRangeQuery performance with 1/2 billion documents in the index
Date Sat, 02 Jan 2010 21:46:13 GMT
The information you gave us is a little spare.
- What JVM do you use, what processor,...
- How many documents match the query? NRQ is very fast, but if your range
hits e.g. one third of all documents, the hit collection of 166 mill docs
also takes lots of time. 7 seconds is normal for this case. Even with 50 mio
docs in the result range, collection would take in the seconds area for most
cpus.
- Why do you index and query with precision step 1? I would first try 6 or 4
with long fields. With too low precSteps, queries get slower because you
have a very, very large term index (64 terms per value!) and your query has
to reposition the term index very often.
- Why do you index NULL values as an integer (not long!) field with value 0?
Those fiels are useless for your query and will never match any range on
LONG values. So why not simply remove them? They also produce lots of terms
with precStep=1 (32 terms).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Kumanan [mailto:kumanan@gmail.com]
> Sent: Saturday, January 02, 2010 8:03 PM
> To: java-user@lucene.apache.org
> Subject: NumericRangeQuery performance with 1/2 billion documents in the
> index
> 
> Hi,
> 
> We have an index with 500 million documents in the index. Index size is
> 104
> GB and 4 GB RAM for the search server.
> 
> When we try to do NumericRangeQuery on document_date field, it takes
> around
> 7-10 seconds. Is this expected for this size index?
> 
> Here is how I index that field.
> 
>             documentDateTimeField = new NumericField(DOCUMENT_DATE_TIME,
> 1,
> Field.Store.NO, true);
>             documentDateTimeField.setOmitNorms(true);
>             documentDateTimeField.setOmitTermFreqAndPositions(true);
> 
>             if(scoreDetails.getDocumentDate() != null) {
> 
> 
> documentDateTimeField.setLongValue(scoreDetails.getDocumentDate().getTime(
> ));
>             } else {
>                 documentDateTimeField.setIntValue(0);
>             }
>             doc.add(documentDateTimeField);
> 
> Here is how I construct the range query.
> 
>                     Long begin = esq.getBeginDate().getTime();
>                     Long end = esq.getEndDate().getTime();
> 
>                     NumericRangeQuery rangeQuery =
> NumericRangeQuery.newLongRange(WordSentenceDocumentFields.DOCUMENT_DATE_TI
> ME,
>                             1, begin, end,
>                             esq.isBeginDateInclusive(),
> esq.isEndDateInclusive());
> 
>                     BooleanQuery bq = new BooleanQuery();
>                     bq.add(query, BooleanClause.Occur.MUST);
>                     bq.add(rangeQuery, BooleanClause.Occur.MUST);
> 
>                     query = bq;
> 
> Am I doing something wrong?
> 
> Thanks
> Kumanan


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message