lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Grouping on multiple shards possible in lucene?
Date Wed, 21 Nov 2012 09:56:05 GMT
If you are only interested in doc addition sorting, then it should be easy
to reverse the doc orders in each segment, using something like IndexSorter.

Shai

On Wed, Nov 21, 2012 at 8:03 AM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> Hi Shai,
>
> I would only want to sort based on doc additions. Ex: d1,d2,d3. Then true
> sort order means d3,d2,d1. Doc timestamp based solution is much more
> involved like you said
>
> It's nice to know that you are already working on it and there will be a
> solution in the near future.
>
> In the meantime, I will live with good old sorting
>
> --
> Ravi
>
> On Wed, Nov 21, 2012 at 1:59 AM, Shai Erera <serera@gmail.com> wrote:
>
> > Hi Ravi,
> >
> > I've been dealing with reverse indexing lately, so let me share with you
> a
> > bit of my experience thus far.
> >
> > First, you need to define what does reverse indexing mean for you. If it
> > means that docs that were indexed in the following order: d1, d2, d3
> should
> > be traversed during search in that order: d3, d2, d1 - then that's one
> > thing.
> > However, if it means that the traversal needs to occur by e.g. the
> > documents' timestamp, as a means to process documents from latest to
> > oldest, then that's a totally different thing, and way more complicated.
> >
> > You will need to think about an IndexReader which reverses the order of
> the
> > segments that it reads, so that segments are processed from latest to
> > oldest. Also, you might need to merge the segments in reverse order too
> > (i.e. if segments s1, s4, s5 are merged, merge them as s5, s4, s1).
> >
> > If you are interested in timestamp based sorting, it gets complicated.
> > Documents flow in from multiple producers (e.g. a parallel crawler,
> > different processes which feed documents to the index et.c) and processed
> > usually by multiple consumers (indexing threads). That makes sorting the
> > index based on a timestamp difficult.
> >
> > Lucene used to have IndexSorter (before 4.0) which could sort an index
> by a
> > field. That was an offline process and if that's what you're after -- you
> > should do just that and forget about the rest. If however you're
> interested
> > in an on-line process, where documents are fed in some order and searched
> > in the exact true order (latest to oldest), that's a more complicated
> > solution -- I'm still working on it :).
> >
> > HTH
> >
> > Shai
> >
> > On Tue, Nov 20, 2012 at 5:37 PM, Ravikumar Govindarajan <
> > ravikumar.govindarajan@gmail.com> wrote:
> >
> > > But, I think it should be possible with some fun codec & merge policy
> > > & MultiReader magic, to have docIDs assigned in "reverse chronological
> > > order"
> > >
> > > Can you explain it a bit more? I was thinking perhaps we store absolute
> > > doc-ids instead of delta to do reverse traversal. But this could waste
> a
> > > lot of storage
> > >
> > > The default merge policy will merge adjacent segments no? Is it going
> to
> > > disturb the ordering?
> > >
> > > --
> > > Ravi
> > >
> > > On Tue, Nov 20, 2012 at 5:19 PM, Michael McCandless <
> > > lucene@mikemccandless.com> wrote:
> > >
> > > > On Tue, Nov 20, 2012 at 1:49 AM, Ravikumar Govindarajan
> > > > <ravikumar.govindarajan@gmail.com> wrote:
> > > > > Thanks Mike. Actually, I think I can eliminate sort-by-time, if I
> am
> > > able
> > > > > to iterate postings in reverse doc-id order. Is this possible in
> > > lucene?
> > > >
> > > > Alas that is not easy to do in Lucene: the posting lists are encoded
> > > > in forward docID order.
> > > >
> > > > But, I think it should be possible with some fun codec & merge policy
> > > > & MultiReader magic, to have docIDs assigned in "reverse
> chronological
> > > > order" ...
> > > >
> > > > > Also, for a TopN query sorted by doc-id will the query terminate
> > early?
> > > >
> > > > Actually, it won't!  But it really should ... you could make a
> > > > Collector that throws an exception once the N docs have been
> > > > collected?
> > > >
> > > > Mike McCandless
> > > >
> > > > http://blog.mikemccandless.com
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message