lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michel Nadeau <aka...@gmail.com>
Subject Lucene Challenge - sum, count, avg, etc.
Date Thu, 01 Apr 2010 01:17:27 GMT
Hi,

We're currently in the process of switching many of our screens from MySQL
to Lucene because MySQL simply dies because we have too much data and it's
becoming too long to generate the stats we need.

So here's one MySQL query that we use to find out our Top 10 Affiliates :

SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales WHERE
sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY affialiate_id
ORDER BY total_sales DESC LIMIT 10;

We currently have our "sales" index, containing all sales and all fields -
and it's one big index (over 10M records). We could fetch all documents
within the date range, loop them and add up the total_sales, but it would be
just crazy to do this all the time (we have a high volume of search).

We made several tests with Solr (Facets, and even the beta CollapseFields),
but nothing is really helping us. We could pre-generate the total_sales for
all possible date ranges... but that would be quite crazy too as the date
range possibilities quickly become endless.

So - is there any known way to efficiently do SUM(), COUNT() (and even AVG()
) using Lucene/Solr/others? I also checked Bobo Browse but it doesn't seem
to offer what I need either.

Thanks for any hints!!!

- Mike
akaris@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message