lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prasenjit mukherjee <prasen....@gmail.com>
Subject Re: Lucene Challenge - sum, count, avg, etc.
Date Thu, 01 Apr 2010 13:25:56 GMT
> Lucene is great at searching for data, but just because it is awesome in one area doesn't
mean it would excel in something it wasn't designed for ;-)

I think lucene is probably one of the better data structures for
computing "conditional aggregated stats". Even for straight search
lucene has to iterate over all the matches and compute the score which
is probably much more computation intensive than doing simple
aggregations ( like SUM,COUNT,AVG,MIN,MAX etc. ). Makes sense for a
contrib project.

I have a feeling that lucene will be faster than MySQL for these kind
of conditional aggregation tasks. But would reserve my comment and
would appreciate expert advice on this.

Having said that I think you are right that lucene shouldn't be used
as a general purpose Analytical tool.

-Prasen


On Thu, Apr 1, 2010 at 8:51 AM, Darren Hartford <dhartford@ghsinc.com> wrote:
> If you are going to end up either copying or moving all the data to lucene (which, when
you hook up lucene even to the existing mysql data, it will still create it's own copy of
the data), you might really want to look at other options:
>
> *column oriented databases (analytical databases).  If open source is a concern: LucidDB,
MonetDB, mysql/InfoBright (GPL/commercial), and I think mysql/InfiniDB may be getting into
it.  And, you can still use your normal SQL tools with better performance for this particular
scenario.
>
> Search engine is about searching for data....trying to turn it into a (financial) analytical
tool may get you more trouble particularly if you haven't looked at other options.
>
> Lucene is great at searching for data, but just because it is awesome in one area doesn't
mean it would excel in something it wasn't designed for ;-)
>
> -D
>
>
> -----Original Message-----
> From: prasenjit mukherjee [mailto:prasen.bea@gmail.com]
> Sent: Thursday, April 01, 2010 8:11 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Challenge - sum, count, avg, etc.
>
> Not sure what you mean by "joining" in lucene , since conceptually
> there is only 1 table ( with many field aka columns ) in lucene. A
> representative query would be good to know the use case.
>
> Again didn't get the "sorting" part.  SUM() will return only 1
> aggregated value, so what do you want to sort it on ?
>
> -Prasen
>
> On Thu, Apr 1, 2010 at 7:44 AM, Michel Nadeau <akaris@gmail.com> wrote:
>> Are you planning to be able to sort by these SUMs? A SpanQuery would work
>> great to get the integers... then you would loop and sum up... but what
>> about "joining" with your other data and sorting?
>>
>> - Mike
>> akaris@gmail.com
>>
>>
>> On Wed, Mar 31, 2010 at 9:23 PM, prasenjit mukherjee
>> <prasen.bea@gmail.com>wrote:
>>
>>> I too am trying to achieve something.
>>>
>>> I am thinking of storing the integer values in  payloads and then
>>> using spanquery classes to compute the respective SUMs
>>>
>>> -Prasen
>>>
>>> On Thu, Apr 1, 2010 at 6:47 AM, Michel Nadeau <akaris@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > We're currently in the process of switching many of our screens from
>>> MySQL
>>> > to Lucene because MySQL simply dies because we have too much data and
>>> it's
>>> > becoming too long to generate the stats we need.
>>> >
>>> > So here's one MySQL query that we use to find out our Top 10 Affiliates
:
>>> >
>>> > SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales WHERE
>>> > sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY
>>> affialiate_id
>>> > ORDER BY total_sales DESC LIMIT 10;
>>> >
>>> > We currently have our "sales" index, containing all sales and all fields
>>> -
>>> > and it's one big index (over 10M records). We could fetch all documents
>>> > within the date range, loop them and add up the total_sales, but it would
>>> be
>>> > just crazy to do this all the time (we have a high volume of search).
>>> >
>>> > We made several tests with Solr (Facets, and even the beta
>>> CollapseFields),
>>> > but nothing is really helping us. We could pre-generate the total_sales
>>> for
>>> > all possible date ranges... but that would be quite crazy too as the date
>>> > range possibilities quickly become endless.
>>> >
>>> > So - is there any known way to efficiently do SUM(), COUNT() (and even
>>> AVG()
>>> > ) using Lucene/Solr/others? I also checked Bobo Browse but it doesn't
>>> seem
>>> > to offer what I need either.
>>> >
>>> > Thanks for any hints!!!
>>> >
>>> > - Mike
>>> > akaris@gmail.com
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message