lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michel Nadeau <aka...@gmail.com>
Subject Re: Lucene Challenge - sum, count, avg, etc.
Date Fri, 02 Apr 2010 03:13:55 GMT
My big question is how do you loop 1M records, sum up field(s), and then
sort on that field... all in memory (could use too much ram) ?  In a
temporary index (could take a while to re-write a lot of documents in a new
index) ?

- Mike
akaris@gmail.com


On Thu, Apr 1, 2010 at 5:31 PM, Chris Lu <chris.lu@gmail.com> wrote:

> Thanks. Not really trying to sell DBSight here since most people here are
> Lucene experts.
> Just to confirm that this "challenge" has been done via Lucene for quite a
> while.
>
> The technique for it is very similar to how facet search is done, which has
> several ways also.
> Million's of rows are not really "that" big when everything is properly
> warmed up.
>
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
>
>
>
> Michel Nadeau wrote:
>
>> I'm sure the DBSight feature is great, but we already have a system in
>> place
>> and we're not throwing it away -- it's closely integrated with our whole
>> platform. We're way past the point to switch our solution to DBSight.
>>  We'd
>> be more than happy to use the DBSight feature if it would be opensource
>> but
>> unfortunately it's not - so we won't even consider it.
>>
>> Chris: are you a developer at DBSight? Can you tell us more about how it
>> works?  Because I don't really see how it can be "fast" when dealing with
>> millions of records... as it has to loop through them, compute, store
>> everything (in a temp index? memory?) and then re-sort.
>>
>> - Mike
>> akaris@gmail.com
>>
>>
>> On Thu, Apr 1, 2010 at 5:02 PM, Chris Lu <chris.lu@gmail.com> wrote:
>>
>>
>>
>>> For DBSight, the aggregated values are computed during run time.
>>> And the sorting on the computed aggregated values are done when
>>> displaying
>>> the results.
>>>
>>> The idea is, after the aggregation, the number of aggregated values are
>>> much much smaller.
>>>
>>>
>>> --
>>> Chris Lu
>>> -------------------------
>>> Instant Scalable Full-Text Search On Any Database/Application
>>> site: http://www.dbsight.net
>>> demo: http://search.dbsight.com
>>> Lucene Database Search in 3 minutes:
>>>
>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>> DBSight customer, a shopping comparison site, (anonymous per request) got
>>> 2.6 Million Euro funding!
>>>
>>>
>>> prasenjit mukherjee wrote:
>>>
>>>
>>>
>>>> On Fri, Apr 2, 2010 at 12:54 AM, Chris Lu <chris.lu@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>> No need for Hadoop. It's even more slower. Lucene can do it easily.
>>>>>
>>>>> This has been implemented in DBSight.
>>>>> The implementation is very similar to Facet search. Just need a way to
>>>>> load
>>>>> the field quickly, like put it in memory or some data structure, and
>>>>> count
>>>>> the sum/min/max during searching.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> This will ONLY compute the aggregated value ( sum,count,min,max etc.
>>>> ). I guess what Mike wants is use the aggregated value to sort the
>>>> entries. Dynamically maintaining a sorted list while searching could
>>>> be extremely expensive.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> --
>>>>> Chris Lu
>>>>> -------------------------
>>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>>> site: http://www.dbsight.net
>>>>> demo: http://search.dbsight.com
>>>>> Lucene Database Search in 3 minutes:
>>>>>
>>>>>
>>>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>>>> DBSight customer, a shopping comparison site, (anonymous per request)
>>>>> got
>>>>> 2.6 Million Euro funding!
>>>>>
>>>>>
>>>>> prasenjit mukherjee wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> This looks like a use case more suited  for Pig ( over Hadoop ).
>>>>>>
>>>>>> It could be difficult for lucene to do sort and sum simultaneously
as
>>>>>> sorting itself depends upon summed value.
>>>>>>
>>>>>> On Thu, Apr 1, 2010 at 11:47 PM, Michel Nadeau <akaris@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Well that's my problem: we have a lot of records of all types
>>>>>>> (afiiliates,
>>>>>>> sales) so looping tons of records each time isn't possible.
>>>>>>>
>>>>>>> - Mike
>>>>>>> akaris@gmail.com
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 1, 2010 at 2:11 PM, prasenjit mukherjee
>>>>>>> <prasen.bea@gmail.com>wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message