lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: NumericField indexing performance
Date Wed, 14 Apr 2010 21:28:17 GMT
Hi Tomislav,

indexing with NumericField takes longer (at least for the default precision step of 4, which
means out of 32 bit integers make 8 subterms with each 4 bits of the value). So you produce
8 times more terms during indexing that must be handled by the indexer. If you have lots of
documents, with distinct values the term index gets larger and larger, but search performance
increases dramatically (for NumericRangeQueries). So if you index *only* numeric fields and
nothing else, a 8 times slower indexing can be true. 

If you are not using NumericRangeQuery or you want tune indexing performance, try larger precision
Steps like 6 or 8. If you don’t use NumericRangeQuery and only want to index the numeric
terms as *one* term, use precStep=Integer.MAX_VALUE. Also check your memory requirements,
as the indexer may need more memory and GC costs too much. Also the index size will increase,
so lots of more I/O is done. Without more details I cannot say anything about your configuration.
So please tell us, how many documents, how many fields and how many numeric fields in which
configuration do you use?


Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen

> -----Original Message-----
> From: Tomislav Poljak []
> Sent: Wednesday, April 14, 2010 8:13 PM
> To:
> Subject: NumericField indexing performance
> Hi,
> is it normal for indexing time to increase up to 10 times after
> introducing NumericField instead of Field (for two fields)?
> I've changed two date fields from String representation (Field) to
> NumericField, now it is:
> doc.add(new NumericField("time").setIntValue(date.getTime()/24/3600))
> and after this change indexing took 10x more time (before it was few
> minutes and after more than an hour and half). I've tested with a
> simple
> counter like this:
> doc.add(new NumericField("endTime").setIntValue(count++))
> but nothing changed, it still takes around 10x longer. If I comment
> adding one numeric field to index time drops significantly and if I
> comment both fields indexing takes only few minutes again.
> Tomislav
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message