lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neil Ireson <n.ire...@sheffield.ac.uk>
Subject Pivot performance
Date Thu, 13 Nov 2014 13:49:13 GMT
Hi all,

I was running an experiment which involved counting terms by day, so I was using pivot facets
to get the counts. However as the number of time and term values increased the performance
got very rubbish. So I knocked up a quick test, using a collection of 1 million documents
with a different number of random values, to compare different ways of getting the counts.

1) Combined = combining the time and term in a single field.
2) Facet = for each term set the query to the term and then get the time facet 
3) Pivot = get the pivot facet.

The results show that, as the number of values (i.e. number of terms * number of times) increases,
everything is fine until around 100,000 values and then it goes pair-shaped for pivots, taking
nearly 4 minutes for 1 million values, the facet based approach produces much more robust
performance.

          |      Processing time in ms     |
Values    |  Combined|     Facet|     Pivot|
9         |       144|       391|        62|
100       |       170|       501|        52|
961       |       789|      1014|       153|
10000     |       907|      1966|       940|
99856     |      1647|      3832|      1960|
499849    |      5445|      7573|    136423|
999867    |      9926|      8690|    233725|


In the end I used the facet rather than pivot approach but I’d like to know why pivots have
such a catastrophic performance crash? Is this an expected behaviour of pivots or am I doing
something wrong?

N


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message