lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: NumericField indexing performance
Date Wed, 14 Apr 2010 21:38:03 GMT
One addition:
If you are indexing millions of numeric fields, you should also try to reuse NumericField
and Document instances (as described in JavaDocs). NumericField creates internally a NumericTokenStream
and lots of small objects (attributes), so GC cost may be high. This is just another idea.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Wednesday, April 14, 2010 11:28 PM
> To: java-user@lucene.apache.org
> Subject: RE: NumericField indexing performance
> 
> Hi Tomislav,
> 
> indexing with NumericField takes longer (at least for the default
> precision step of 4, which means out of 32 bit integers make 8 subterms
> with each 4 bits of the value). So you produce 8 times more terms
> during indexing that must be handled by the indexer. If you have lots
> of documents, with distinct values the term index gets larger and
> larger, but search performance increases dramatically (for
> NumericRangeQueries). So if you index *only* numeric fields and nothing
> else, a 8 times slower indexing can be true.
> 
> If you are not using NumericRangeQuery or you want tune indexing
> performance, try larger precision Steps like 6 or 8. If you don’t use
> NumericRangeQuery and only want to index the numeric terms as *one*
> term, use precStep=Integer.MAX_VALUE. Also check your memory
> requirements, as the indexer may need more memory and GC costs too
> much. Also the index size will increase, so lots of more I/O is done.
> Without more details I cannot say anything about your configuration. So
> please tell us, how many documents, how many fields and how many
> numeric fields in which configuration do you use?
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
> > -----Original Message-----
> > From: Tomislav Poljak [mailto:tpoljak@gmail.com]
> > Sent: Wednesday, April 14, 2010 8:13 PM
> > To: java-user@lucene.apache.org
> > Subject: NumericField indexing performance
> >
> > Hi,
> > is it normal for indexing time to increase up to 10 times after
> > introducing NumericField instead of Field (for two fields)?
> >
> > I've changed two date fields from String representation (Field) to
> > NumericField, now it is:
> >
> > doc.add(new NumericField("time").setIntValue(date.getTime()/24/3600))
> >
> > and after this change indexing took 10x more time (before it was few
> > minutes and after more than an hour and half). I've tested with a
> > simple
> > counter like this:
> >
> > doc.add(new NumericField("endTime").setIntValue(count++))
> >
> > but nothing changed, it still takes around 10x longer. If I comment
> > adding one numeric field to index time drops significantly and if I
> > comment both fields indexing takes only few minutes again.
> >
> > Tomislav
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message