lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: lucene 2.9 sorting algorithm
Date Tue, 20 Oct 2009 10:51:28 GMT
I didn't really follow that thread either - but we didn't move to the  
new Comp Api because of it's perfomance vs the old.

- Mark

http://www.lucidimagination.com (mobile)

On Oct 20, 2009, at 4:22 AM, "Uwe Schindler" <uwe@thetaphi.de> wrote:

> I did not follow the whole thread, but I do not understand what’s ba 
> d with the new API that rectifies to preserve the old one. The old A 
> PI does not fit very well with the segment based search and a lot of 
>  ugly stuff was done around to make both APIs work the same.
>
> For me it is not very complicated to create a new-style Comparator.  
> The only difference is that you have to implement more methods for  
> the comparison, but if you e.g. take the provided comparators for  
> the basic data types as a base, it is easy to understand how it  
> works and you can modify the examples.
>
> And: as far as I know, the old API is not really segment wise, so  
> reopen() cost is much higher and FieldCache gets slower, because the  
> top level reader must be reloaded into cache not the segments.
>
> Uwe
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> From: Jake Mannix [mailto:jake.mannix@gmail.com]
> Sent: Tuesday, October 20, 2009 8:37 AM
> To: java-dev@lucene.apache.org
> Subject: Re: lucene 2.9 sorting algorithm
>
> Given that this new API is pretty unweildy, and seems to not  
> actually perform any better than the old one... are we going to  
> consider revisiting that?
>
>   -jake
>
> On Mon, Oct 19, 2009 at 11:27 PM, Uwe Schindler <uwe@thetaphi.de>  
> wrote:
> The old search API is already removed in trunk…
>
> Uwe
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> From: John Wang [mailto:john.wang@gmail.com]
> Sent: Tuesday, October 20, 2009 3:28 AM
> To: java-dev@lucene.apache.org
> Subject: Re: lucene 2.9 sorting algorithm
>
> Hi Michael:
>
>      Was wondering if you got a chance to take a look at this.
>
>      Since deprecated APIs are being removed in 3.0, I was wondering  
> if/when we would decide on keeping the ScoreDocComparator API and  
> thus would be kept for Lucene 3.0.
>
> Thanks
>
> -John
>
> On Fri, Oct 16, 2009 at 9:53 AM, Michael McCandless <lucene@mikemccandless.com 
> > wrote:
> Oh, no problem...
>
> Mike
>
> On Fri, Oct 16, 2009 at 12:33 PM, John Wang <john.wang@gmail.com>  
> wrote:
> > Mike, just a clarification on my first perf report email.
> > The first section, numHits is incorrectly labeled, it should be 20  
> instead
> > of 50. Sorry about the possible confusion.
> > Thanks
> > -John
> >
> > On Fri, Oct 16, 2009 at 3:21 AM, Michael McCandless
> > <lucene@mikemccandless.com> wrote:
> >>
> >> Thanks John; I'll have a look.
> >>
> >> Mike
> >>
> >> On Fri, Oct 16, 2009 at 12:57 AM, John Wang <john.wang@gmail.com>  
> wrote:
> >> > Hi Michael:
> >> >     I added classes: ScoreDocComparatorQueue and  
> OneSortNoScoreCollector
> >> > as
> >> > a more general case. I think keeping the old api for  
> ScoreDocComparator
> >> > and
> >> > SortComparatorSource would work.
> >> >   Please take a look.
> >> > Thanks
> >> > -John
> >> >
> >> > On Thu, Oct 15, 2009 at 6:52 PM, John Wang  
> <john.wang@gmail.com> wrote:
> >> >>
> >> >> Hi Michael:
> >> >>      It is open, http://code.google.com/p/lucene-book/source/checkout
> >> >>      I think I sent the https url instead, sorry.
> >> >>     The multi PQ sorting is fairly self-contained, I have 2  
> versions, 1
> >> >> for string and 1 for int, each are Collector impls.
> >> >>      I shouldn't say the Multi Q is faster on int sort, it is  
> within
> >> >> the
> >> >> error boundary. The diff is very very small, I would stay they  
> are more
> >> >> equal.
> >> >>      If you think it is a good thing to go this way, (if not  
> for the
> >> >> perf,
> >> >> just for the simpler api) I'd be happy to work on a patch.
> >> >> Thanks
> >> >> -John
> >> >> On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless
> >> >> <lucene@mikemccandless.com> wrote:
> >> >>>
> >> >>> John, looks like this requires login -- any plans to open  
> that up, or,
> >> >>> post the code on an issue?
> >> >>>
> >> >>> How self-contained is your Multi PQ sorting?  EG is it a  
> standalone
> >> >>> Collector impl that I can test?
> >> >>>
> >> >>> Mike
> >> >>>
> >> >>> On Thu, Oct 15, 2009 at 6:33 PM, John Wang  
> <john.wang@gmail.com>
> >> >>> wrote:
> >> >>> > BTW, we are have a little sandbox for these experiments. 

> And all my
> >> >>> > testcode
> >> >>> > are at. They are not very polished.
> >> >>> >
> >> >>> > https://lucene-book.googlecode.com/svn/trunk
> >> >>> >
> >> >>> > -John
> >> >>> >
> >> >>> > On Thu, Oct 15, 2009 at 3:29 PM, John Wang <john.wang@gmail.com

> >
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> Numbers Mike requested for Int types:
> >> >>> >>
> >> >>> >> only the time/cputime are posted, others are all the same
 
> since the
> >> >>> >> algorithm is the same.
> >> >>> >>
> >> >>> >> Lucene 2.9:
> >> >>> >> numhits: 10
> >> >>> >> time: 14619495
> >> >>> >> cpu: 146126
> >> >>> >>
> >> >>> >> numhits: 20
> >> >>> >> time: 14550568
> >> >>> >> cpu: 163242
> >> >>> >>
> >> >>> >> numhits: 100
> >> >>> >> time: 16467647
> >> >>> >> cpu: 178379
> >> >>> >>
> >> >>> >>
> >> >>> >> my test:
> >> >>> >> numHits: 10
> >> >>> >> time: 14101094
> >> >>> >> cpu: 144715
> >> >>> >>
> >> >>> >> numHits: 20
> >> >>> >> time: 14804821
> >> >>> >> cpu: 151305
> >> >>> >>
> >> >>> >> numHits: 100
> >> >>> >> time: 15372157
> >> >>> >> cpu time: 158842
> >> >>> >>
> >> >>> >> Conclusions:
> >> >>> >> The are very similar, the differences are all within error
 
> bounds,
> >> >>> >> especially with lower PQ sizes, which second sort alg
again
> >> >>> >> slightly
> >> >>> >> faster.
> >> >>> >>
> >> >>> >> Hope this helps.
> >> >>> >>
> >> >>> >> -John
> >> >>> >>
> >> >>> >>
> >> >>> >> On Thu, Oct 15, 2009 at 3:04 PM, Yonik Seeley
> >> >>> >> <yonik@lucidimagination.com>
> >> >>> >> wrote:
> >> >>> >>>
> >> >>> >>> On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless
> >> >>> >>> <lucene@mikemccandless.com> wrote:
> >> >>> >>> > Though it'd be odd if the switch to searching
by segment
> >> >>> >>> > really was most of the gains here.
> >> >>> >>>
> >> >>> >>> I had assumed that much of the improvement was due
to  
> ditching
> >> >>> >>> MultiTermEnum/MultiTermDocs.
> >> >>> >>> Note that LUCENE-1483 was before LUCENE-1596... but
that  
> only
> >> >>> >>> helps
> >> >>> >>> with queries that use a TermEnum (range, prefix, etc).
> >> >>> >>>
> >> >>> >>> -Yonik
> >> >>> >>> http://www.lucidimagination.com
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>  
> ---------------------------------------------------------------------
> >> >>> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> >>> >>> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >> >>> >>>
> >> >>> >>
> >> >>> >
> >> >>> >
> >> >>>
> >> >>>  
> ---------------------------------------------------------------------
> >> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> >>> For additional commands, e-mail: java-dev- 
> help@lucene.apache.org
> >> >>>
> >> >>
> >> >
> >> >
> >>
> >>  
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>

Mime
View raw message