lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Grouping on multiple shards possible in lucene?
Date Fri, 16 Nov 2012 16:10:23 GMT
Yes, this is possible using  Lucene's grouping APIs.

It looks like index time grouping won't work, since you get the same
parent spread out across time, but you can use the two-pass grouping
instead ... run the FirstPassGroupingCollector on each shard, get the
top groups from each, merge those and pick the top N groups, run
SecondPassGroupingCollector to get TopGroups from each shard, and then
use TopGroups.merge to merge the results.

Lucene provides the APIs to do this ... but it's up to you to send
requests out to other shards, gather the results, call the merge, etc.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Nov 16, 2012 at 9:43 AM, Ravikumar Govindarajan
<ravikumar.govindarajan@gmail.com> wrote:
> The formatter has wrecked the table... Reposting it
>
> Please read it as follows
>
> {ENTITY,PARENT,DATE,SHARD} tuple
>
> M1  C1  12/11/2010  A1
> M2  C2  12/11/2011  A2
> M3  C4  12/02/2012  A3
> M4  C1  12/11/2012  A4
> M5  C2  13/11/2012  A4
> M6  C3  14/11/2012  A4
>
> I need to group this based on parents ordered by time. The shards
> themselves are in increasing order of time {A1-A4 in ascending order of
> time}
>
> So, if for some search, the entities matched are M1,M2,M3,M4&M6, the set of
> results returned should be *C3,C2,C1,C4*
>
> I am aware of grouping search in lucene, but extending it to multiple
> shards is possible? More importantly, are there ways by which I can
> re-organize my Documents during index-time to optimize query performance
> for such a grouping feature?
>
> --
> Ravi
>
>
> On Fri, Nov 16, 2012 at 8:05 PM, Ravikumar Govindarajan <
> ravikumar.govindarajan@gmail.com> wrote:
>
>> We are trying to do a grouping search that spans multiple shards ordered
>> by time.
>>
>>
>> *ENTITY                        PARENT
>>     TIME                    SHARD*
>> M1                                     C1
>>            12-Nov-2010           A1
>> M2                                     C2
>>            12-Nov-2011           A2
>> M3                                     C4
>>            12-Feb-2012           A3
>> M4                                     C1
>>            12-Nov-2012           A4
>> M5                                     C2
>>            13-Nov-2012           A4
>> M6                                     C3
>>            14-Nov-2012           A4
>>
>> I need to group this based on parents ordered by time. The shards
>> themselves are in increasing order of time {A1-A4 in ascending order of
>> time}
>>
>> So, if for some search, the entities matched are M1,M2,M3,M4&M6, the set
>> of results returned should be *C3,C2,C1,C4*
>>
>> I am aware of grouping search in lucene, but extending it to multiple
>> shards is possible? More importantly, are there ways by which I can
>> re-organize my Documents during index-time to optimize query performance
>> for such a grouping feature?
>>
>> --
>> Ravi
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message