lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lu <chris...@gmail.com>
Subject Re: Lucene Challenge - sum, count, avg, etc.
Date Thu, 01 Apr 2010 21:31:49 GMT
Thanks. Not really trying to sell DBSight here since most people here 
are Lucene experts.
Just to confirm that this "challenge" has been done via Lucene for quite 
a while.

The technique for it is very similar to how facet search is done, which 
has several ways also.
Million's of rows are not really "that" big when everything is properly 
warmed up.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro
funding!



Michel Nadeau wrote:
> I'm sure the DBSight feature is great, but we already have a system in place
> and we're not throwing it away -- it's closely integrated with our whole
> platform. We're way past the point to switch our solution to DBSight.  We'd
> be more than happy to use the DBSight feature if it would be opensource but
> unfortunately it's not - so we won't even consider it.
>
> Chris: are you a developer at DBSight? Can you tell us more about how it
> works?  Because I don't really see how it can be "fast" when dealing with
> millions of records... as it has to loop through them, compute, store
> everything (in a temp index? memory?) and then re-sort.
>
> - Mike
> akaris@gmail.com
>
>
> On Thu, Apr 1, 2010 at 5:02 PM, Chris Lu <chris.lu@gmail.com> wrote:
>
>   
>> For DBSight, the aggregated values are computed during run time.
>> And the sorting on the computed aggregated values are done when displaying
>> the results.
>>
>> The idea is, after the aggregation, the number of aggregated values are
>> much much smaller.
>>
>>
>> --
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes:
>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per request) got
>> 2.6 Million Euro funding!
>>
>>
>> prasenjit mukherjee wrote:
>>
>>     
>>> On Fri, Apr 2, 2010 at 12:54 AM, Chris Lu <chris.lu@gmail.com> wrote:
>>>
>>>
>>>       
>>>> No need for Hadoop. It's even more slower. Lucene can do it easily.
>>>>
>>>> This has been implemented in DBSight.
>>>> The implementation is very similar to Facet search. Just need a way to
>>>> load
>>>> the field quickly, like put it in memory or some data structure, and
>>>> count
>>>> the sum/min/max during searching.
>>>>
>>>>
>>>>         
>>> This will ONLY compute the aggregated value ( sum,count,min,max etc.
>>> ). I guess what Mike wants is use the aggregated value to sort the
>>> entries. Dynamically maintaining a sorted list while searching could
>>> be extremely expensive.
>>>
>>>
>>>
>>>
>>>       
>>>> --
>>>> Chris Lu
>>>> -------------------------
>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>> site: http://www.dbsight.net
>>>> demo: http://search.dbsight.com
>>>> Lucene Database Search in 3 minutes:
>>>>
>>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>>> DBSight customer, a shopping comparison site, (anonymous per request) got
>>>> 2.6 Million Euro funding!
>>>>
>>>>
>>>> prasenjit mukherjee wrote:
>>>>
>>>>
>>>>         
>>>>> This looks like a use case more suited  for Pig ( over Hadoop ).
>>>>>
>>>>> It could be difficult for lucene to do sort and sum simultaneously as
>>>>> sorting itself depends upon summed value.
>>>>>
>>>>> On Thu, Apr 1, 2010 at 11:47 PM, Michel Nadeau <akaris@gmail.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> Well that's my problem: we have a lot of records of all types
>>>>>> (afiiliates,
>>>>>> sales) so looping tons of records each time isn't possible.
>>>>>>
>>>>>> - Mike
>>>>>> akaris@gmail.com
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 1, 2010 at 2:11 PM, prasenjit mukherjee
>>>>>> <prasen.bea@gmail.com>wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>       
>
>   

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message