lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fuad Efendi <>
Subject Re: Understanding performance characteristics of the new point types
Date Wed, 02 Nov 2016 18:43:35 GMT
Hi florian,

If my understanting is correct, you are using IntPoint to index 4 different
document types which is overkill; why not to try classic “non-tokenized”
keyword field (a.k.a. “legacy string”) for document types? Cardinality is
only four for document types.


Fuad Efendi

(416) 993-2060
Recommender Systems

On November 2, 2016 at 2:10:14 PM, Florian Hopf ( wrote:


we are indexing different types of documents in one Lucene index. They
have most fields in common but we need to filter some types for certain
queries. We are using numeric values to determine the types of documents
(1-4). Now, when querying these documents we see that the performance
degrades the more documents of a type are in the index.

Using a simple test that indexes 10 Mio documents I can see the
following when filtering on everything but 100000 documents:

* When issuing the query alone the new PointRangeQuery
(IntPoint.newExactQuery) is a lot faster than term and legacy numeric
(in my case around 2x the speed of the others)
* When issuing a bool query that contains a term query that selects 5
documents together with a must query that selects on the numeric the
points are 5x slower than legacy numeric
(LegacyNumericRangeQuery.newIntRange) and terms (TermQuery)
* When doing the same thing with SHOULD instead of MUST for the
additional term query the PointRangeQuery is fastests as well

I suspect this to be related to the discussion in

Of course there could be something wrong with the way I am measuring the
performance, I'd be happy to share the code. But what I read in the
ticket above seems to hint that the points are not suited for every use
case? Is it recommended to use StringField in a case like this instead?


Florian Hopf
Freelance Software Developer

To unsubscribe, e-mail:
For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message