lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prasenjit mukherjee <prasen....@gmail.com>
Subject Re: Lucene Challenge - sum, count, avg, etc.
Date Thu, 01 Apr 2010 18:11:09 GMT
If the number of documents ( in this case "Affiliates" )  aren't huge,
sorting can probably be done as a post-process.

Still dont see any need of joins here.


On Thu, Apr 1, 2010 at 7:16 PM, Michel Nadeau <akaris@gmail.com> wrote:
> Hi,
>
> Here's an example of raw data that would be in my Sales index:
>
> *Affiliate / SaleDate / SaleAmount*
> * mike / 2010-03-01 / 10.00
> * john / 2010-03-01 / 10.00
> * mike / 2010-03-02 / 15.00
> * john / 2010-03-02 /  5.00
> * mike / 2010-03-03 / 20.00
> * john / 2010-03-03 /  1.00
> * mike / 2010-03-04 / 10.00
> * john / 2010-03-04 / 10.00
> * mike / 2010-03-05 / 15.00
> * john / 2010-03-05 /  5.00
> * mike / 2010-03-06 / 20.00
> * john / 2010-03-06 /  1.00
>
> So our 2 affiliates mike and john made 6 sales each between 2010-03-01 and
> 2010-03-06. My ultimate query should return this for a query between
> 2010-03-01 and 2010-03-06 :
>
> *Affiliate / TotalSales*
> * mike / 90.00
> * john / 32.00
>
> So it's exactly like -
>
> SELECT Affiliate, sum(SaleAmount) as TotalSales FROM Sales
>  WHERE SaleDate >= '2010-03-01' AND SaleDate <= '2010-03-06'
>  GROUP BY Affiliate
>  ORDER BY TotalSales DESC;
>
> - Mike
> akaris@gmail.com
>
>
> On Thu, Apr 1, 2010 at 8:11 AM, prasenjit mukherjee <prasen.bea@gmail.com>wrote:
>
>> Not sure what you mean by "joining" in lucene , since conceptually
>> there is only 1 table ( with many field aka columns ) in lucene. A
>> representative query would be good to know the use case.
>>
>> Again didn't get the "sorting" part.  SUM() will return only 1
>> aggregated value, so what do you want to sort it on ?
>>
>> -Prasen
>>
>> On Thu, Apr 1, 2010 at 7:44 AM, Michel Nadeau <akaris@gmail.com> wrote:
>> > Are you planning to be able to sort by these SUMs? A SpanQuery would work
>> > great to get the integers... then you would loop and sum up... but what
>> > about "joining" with your other data and sorting?
>> >
>> > - Mike
>> > akaris@gmail.com
>> >
>> >
>> > On Wed, Mar 31, 2010 at 9:23 PM, prasenjit mukherjee
>> > <prasen.bea@gmail.com>wrote:
>> >
>> >> I too am trying to achieve something.
>> >>
>> >> I am thinking of storing the integer values in  payloads and then
>> >> using spanquery classes to compute the respective SUMs
>> >>
>> >> -Prasen
>> >>
>> >> On Thu, Apr 1, 2010 at 6:47 AM, Michel Nadeau <akaris@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > We're currently in the process of switching many of our screens from
>> >> MySQL
>> >> > to Lucene because MySQL simply dies because we have too much data and
>> >> it's
>> >> > becoming too long to generate the stats we need.
>> >> >
>> >> > So here's one MySQL query that we use to find out our Top 10
>> Affiliates :
>> >> >
>> >> > SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales WHERE
>> >> > sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY
>> >> affialiate_id
>> >> > ORDER BY total_sales DESC LIMIT 10;
>> >> >
>> >> > We currently have our "sales" index, containing all sales and all
>> fields
>> >> -
>> >> > and it's one big index (over 10M records). We could fetch all
>> documents
>> >> > within the date range, loop them and add up the total_sales, but it
>> would
>> >> be
>> >> > just crazy to do this all the time (we have a high volume of search).
>> >> >
>> >> > We made several tests with Solr (Facets, and even the beta
>> >> CollapseFields),
>> >> > but nothing is really helping us. We could pre-generate the
>> total_sales
>> >> for
>> >> > all possible date ranges... but that would be quite crazy too as the
>> date
>> >> > range possibilities quickly become endless.
>> >> >
>> >> > So - is there any known way to efficiently do SUM(), COUNT() (and even
>> >> AVG()
>> >> > ) using Lucene/Solr/others? I also checked Bobo Browse but it doesn't
>> >> seem
>> >> > to offer what I need either.
>> >> >
>> >> > Thanks for any hints!!!
>> >> >
>> >> > - Mike
>> >> > akaris@gmail.com
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message