lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: lucene 2.9 sorting algorithm
Date Tue, 20 Oct 2009 08:22:19 GMT
I did not follow the whole thread, but I do not understand what's bad with
the new API that rectifies to preserve the old one. The old API does not fit
very well with the segment based search and a lot of ugly stuff was done
around to make both APIs work the same.

 

For me it is not very complicated to create a new-style Comparator. The only
difference is that you have to implement more methods for the comparison,
but if you e.g. take the provided comparators for the basic data types as a
base, it is easy to understand how it works and you can modify the examples.

 

And: as far as I know, the old API is not really segment wise, so reopen()
cost is much higher and FieldCache gets slower, because the top level reader
must be reloaded into cache not the segments.

 

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Jake Mannix [mailto:jake.mannix@gmail.com] 
Sent: Tuesday, October 20, 2009 8:37 AM
To: java-dev@lucene.apache.org
Subject: Re: lucene 2.9 sorting algorithm

 

Given that this new API is pretty unweildy, and seems to not actually
perform any better than the old one... are we going to consider revisiting
that?

  -jake

On Mon, Oct 19, 2009 at 11:27 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

The old search API is already removed in trunk.

 

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: John Wang [mailto:john.wang@gmail.com] 
Sent: Tuesday, October 20, 2009 3:28 AM
To: java-dev@lucene.apache.org
Subject: Re: lucene 2.9 sorting algorithm

 

Hi Michael:

 

     Was wondering if you got a chance to take a look at this.

 

     Since deprecated APIs are being removed in 3.0, I was wondering if/when
we would decide on keeping the ScoreDocComparator API and thus would be kept
for Lucene 3.0.

 

Thanks

 

-John

On Fri, Oct 16, 2009 at 9:53 AM, Michael McCandless
<lucene@mikemccandless.com> wrote:

Oh, no problem...

Mike


On Fri, Oct 16, 2009 at 12:33 PM, John Wang <john.wang@gmail.com> wrote:
> Mike, just a clarification on my first perf report email.
> The first section, numHits is incorrectly labeled, it should be 20 instead
> of 50. Sorry about the possible confusion.
> Thanks
> -John
>
> On Fri, Oct 16, 2009 at 3:21 AM, Michael McCandless
> <lucene@mikemccandless.com> wrote:
>>
>> Thanks John; I'll have a look.
>>
>> Mike
>>
>> On Fri, Oct 16, 2009 at 12:57 AM, John Wang <john.wang@gmail.com> wrote:
>> > Hi Michael:
>> >     I added classes: ScoreDocComparatorQueue and
OneSortNoScoreCollector
>> > as
>> > a more general case. I think keeping the old api for ScoreDocComparator
>> > and
>> > SortComparatorSource would work.
>> >   Please take a look.
>> > Thanks
>> > -John
>> >
>> > On Thu, Oct 15, 2009 at 6:52 PM, John Wang <john.wang@gmail.com> wrote:
>> >>
>> >> Hi Michael:
>> >>      It is open, http://code.google.com/p/lucene-book/source/checkout
>> >>      I think I sent the https url instead, sorry.
>> >>     The multi PQ sorting is fairly self-contained, I have 2 versions,
1
>> >> for string and 1 for int, each are Collector impls.
>> >>      I shouldn't say the Multi Q is faster on int sort, it is within
>> >> the
>> >> error boundary. The diff is very very small, I would stay they are
more
>> >> equal.
>> >>      If you think it is a good thing to go this way, (if not for the
>> >> perf,
>> >> just for the simpler api) I'd be happy to work on a patch.
>> >> Thanks
>> >> -John
>> >> On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless
>> >> <lucene@mikemccandless.com> wrote:
>> >>>
>> >>> John, looks like this requires login -- any plans to open that up,
or,
>> >>> post the code on an issue?
>> >>>
>> >>> How self-contained is your Multi PQ sorting?  EG is it a standalone
>> >>> Collector impl that I can test?
>> >>>
>> >>> Mike
>> >>>
>> >>> On Thu, Oct 15, 2009 at 6:33 PM, John Wang <john.wang@gmail.com>
>> >>> wrote:
>> >>> > BTW, we are have a little sandbox for these experiments. And all
my
>> >>> > testcode
>> >>> > are at. They are not very polished.
>> >>> >
>> >>> > https://lucene-book.googlecode.com/svn/trunk
>> >>> >
>> >>> > -John
>> >>> >
>> >>> > On Thu, Oct 15, 2009 at 3:29 PM, John Wang <john.wang@gmail.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> Numbers Mike requested for Int types:
>> >>> >>
>> >>> >> only the time/cputime are posted, others are all the same since
the
>> >>> >> algorithm is the same.
>> >>> >>
>> >>> >> Lucene 2.9:
>> >>> >> numhits: 10
>> >>> >> time: 14619495
>> >>> >> cpu: 146126
>> >>> >>
>> >>> >> numhits: 20
>> >>> >> time: 14550568
>> >>> >> cpu: 163242
>> >>> >>
>> >>> >> numhits: 100
>> >>> >> time: 16467647
>> >>> >> cpu: 178379
>> >>> >>
>> >>> >>
>> >>> >> my test:
>> >>> >> numHits: 10
>> >>> >> time: 14101094
>> >>> >> cpu: 144715
>> >>> >>
>> >>> >> numHits: 20
>> >>> >> time: 14804821
>> >>> >> cpu: 151305
>> >>> >>
>> >>> >> numHits: 100
>> >>> >> time: 15372157
>> >>> >> cpu time: 158842
>> >>> >>
>> >>> >> Conclusions:
>> >>> >> The are very similar, the differences are all within error
bounds,
>> >>> >> especially with lower PQ sizes, which second sort alg again
>> >>> >> slightly
>> >>> >> faster.
>> >>> >>
>> >>> >> Hope this helps.
>> >>> >>
>> >>> >> -John
>> >>> >>
>> >>> >>
>> >>> >> On Thu, Oct 15, 2009 at 3:04 PM, Yonik Seeley
>> >>> >> <yonik@lucidimagination.com>
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless
>> >>> >>> <lucene@mikemccandless.com> wrote:
>> >>> >>> > Though it'd be odd if the switch to searching by segment
>> >>> >>> > really was most of the gains here.
>> >>> >>>
>> >>> >>> I had assumed that much of the improvement was due to ditching
>> >>> >>> MultiTermEnum/MultiTermDocs.
>> >>> >>> Note that LUCENE-1483 was before LUCENE-1596... but that
only
>> >>> >>> helps
>> >>> >>> with queries that use a TermEnum (range, prefix, etc).
>> >>> >>>
>> >>> >>> -Yonik
>> >>> >>> http://www.lucidimagination.com
>> >>> >>>
>> >>> >>>
>> >>> >>>
---------------------------------------------------------------------
>> >>> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >>> >>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >>> >>>
>> >>> >>
>> >>> >
>> >>> >
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >>>
>> >>
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

 

 


Mime
View raw message