lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: Grouping on multiple shards possible in lucene?
Date Tue, 20 Nov 2012 06:49:21 GMT
Thanks Mike. Actually, I think I can eliminate sort-by-time, if I am able
to iterate postings in reverse doc-id order. Is this possible in lucene?
Also, for a TopN query sorted by doc-id will the query terminate early?

--
Ravi

On Fri, Nov 16, 2012 at 9:40 PM, Michael McCandless <
lucene@mikemccandless.com> wrote

> Yes, this is possible using  Lucene's grouping APIs.
>
> It looks like index time grouping won't work, since you get the same
> parent spread out across time, but you can use the two-pass grouping
> instead ... run the FirstPassGroupingCollector on each shard, get the
> top groups from each, merge those and pick the top N groups, run
> SecondPassGroupingCollector to get TopGroups from each shard, and then
> use TopGroups.merge to merge the results.
>
> Lucene provides the APIs to do this ... but it's up to you to send
> requests out to other shards, gather the results, call the merge, etc.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Fri, Nov 16, 2012 at 9:43 AM, Ravikumar Govindarajan
> <ravikumar.govindarajan@gmail.com> wrote:
> > The formatter has wrecked the table... Reposting it
> >
> > Please read it as follows
> >
> > {ENTITY,PARENT,DATE,SHARD} tuple
> >
> > M1  C1  12/11/2010  A1
> > M2  C2  12/11/2011  A2
> > M3  C4  12/02/2012  A3
> > M4  C1  12/11/2012  A4
> > M5  C2  13/11/2012  A4
> > M6  C3  14/11/2012  A4
> >
> > I need to group this based on parents ordered by time. The shards
> > themselves are in increasing order of time {A1-A4 in ascending order of
> > time}
> >
> > So, if for some search, the entities matched are M1,M2,M3,M4&M6, the set
> of
> > results returned should be *C3,C2,C1,C4*
> >
> > I am aware of grouping search in lucene, but extending it to multiple
> > shards is possible? More importantly, are there ways by which I can
> > re-organize my Documents during index-time to optimize query performance
> > for such a grouping feature?
> >
> > --
> > Ravi
> >
> >
> > On Fri, Nov 16, 2012 at 8:05 PM, Ravikumar Govindarajan <
> > ravikumar.govindarajan@gmail.com> wrote:
> >
> >> We are trying to do a grouping search that spans multiple shards ordered
> >> by time.
> >>
> >>
> >> *ENTITY                        PARENT
> >>     TIME                    SHARD*
> >> M1                                     C1
> >>            12-Nov-2010           A1
> >> M2                                     C2
> >>            12-Nov-2011           A2
> >> M3                                     C4
> >>            12-Feb-2012           A3
> >> M4                                     C1
> >>            12-Nov-2012           A4
> >> M5                                     C2
> >>            13-Nov-2012           A4
> >> M6                                     C3
> >>            14-Nov-2012           A4
> >>
> >> I need to group this based on parents ordered by time. The shards
> >> themselves are in increasing order of time {A1-A4 in ascending order of
> >> time}
> >>
> >> So, if for some search, the entities matched are M1,M2,M3,M4&M6, the set
> >> of results returned should be *C3,C2,C1,C4*
> >>
> >> I am aware of grouping search in lucene, but extending it to multiple
> >> shards is possible? More importantly, are there ways by which I can
> >> re-organize my Documents during index-time to optimize query performance
> >> for such a grouping feature?
> >>
> >> --
> >> Ravi
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message