incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: SortCollector
Date Thu, 21 May 2009 18:32:46 GMT
On Tue, May 19, 2009 at 8:18 PM, Marvin Humphrey <> wrote:

>> Float/Double will have to compute separate ords (at indexing time)?
>> (Lucene just uses the float/double values themselves).
> We should probably just use values.  That means we'll have to deal with endian
> issues on every comparison, though, since we're reading directly off of disk
> rather than pre-loading.  Same thing with integers, including datestamp types.


> (Incidentally, we have a related inefficiency when loading text sort cache
> values: for security reasons, we'll have to check UTF-8 validity on each load
> op.)


>> Requiring custom comparators to compile down to ords at indexing time
>> is possibly to restrictive?
> I expect the common use case will be custom-sorted text fields.  The user
> wouldn't have to know anything about the ords; we'd generate all that behind
> the scenes after sorting using their custom Compare_Values() method.


> What other use cases do we anticipate?

I want to compare according to some private structure I maintain
myself (on disk or RAM)?

>> > Third, the Lucene code makes method calls for each comparison, but for ord
>> > comparisons, we can use static inline functions which resolve to array lookups.
>> > (Coding convention: I use the prefix "SI_" for all static inline functions,
>> > the prefix "S_" for static functions -- so SI_compare_by_ord2() and friends
>> > static inline.)
>> But you're trading off a switch statement instead...
> A switch statement which compiles to a jump table is cheaper than a method
> call -- no function call overhead, and except for the scoring case, these are
> leaf blocks.
> The goal for SortCollector is not only to avoid code duplication a la
>  I'm also hoping to write something *faster*.

I think switch statement needs to do bounds checking of its arg?  But,
yeah it's likely cheaper than a method call that's not inlined.  Maybe
we should test out a switch statement instead, from javaland...

>> and since you're not specializing, you're eg making the common case (sort by
>> one field) pay the looping overhead of the uncommon case.
> A "do-while" loop was chosen over a "for" loop so that we wouldn't incur
> looping overhead costs unless the first comparison produced a tie.


> Even better, I've figured out how to incorporate the "queue full" check into
> the switch: we start out with the first comparator type set to AUTO_ACCEPT,
> then change it once the queue is full.

Sweet!  A kind of "runtime/live" polymorphism...

> With regards to the "common case (sort by one field)"... it probably makes
> sense to consider two common cases:
>  * Sort by one field then doc id.
>  * Sort by score then doc id.
> The only savings we by breaking out a single-comparator subclass are:
>  * A single iterator variable initialization (i = 0).
>  * A single array deref at the top (actions[i]).
>  * Loop termination overhead -- an increment, a dereferenced load, and a
>    compare op -- but only in the event of a tie.
> I just don't see how that matters so much that we need to double our class
> count.

OK, but with specialization such doubling shouldn't matter.

> With regards to sort-by-score.... We have an opportunity to achieve meaningful
> gains with specialized collection for term scorers once we implement PFOR, and
> it's counter-productive to spend time writing, testing, and maintaining a
> TopDocCollector class that doesn't go far enough.

Other cases gain from specialization too.

>> > I found one additional inefficiency in the Lucene implementation: score() is
>> > called twice for "competitive" docs.  (This isn't true for
>> > TopScoreDocCollector, which is the main class for collecting scored docs --
>> > it's only true for the TopFieldDoc implementations that also track score). It
>> > would be nice to avoid that, but it probably doesn't amount to much.
>> Please give details on where that's happening... we tried not to do
>> that.
> This was an error.  I missed the ScoreCachingWrapperScorer aspect of
> RelevanceComparator.

Actually it wasn't an error... and I'm glad you didn't say so until
now :)  Because, your assertion caused Shai to go scrutinize and he
did find a case to fix which we assumed was what you were referring

>> > For ordinary searches, sort order is determined by score, with doc num
>> > breaking ties.  This behavior can be achieved using SortCollector like so:
>> >
>> >    if (!sort_spec) {
>> >        self->comparator_types[0] = HitComp_BY_SCORE;
>> >        self->comparator_types[1] = HitComp_BY_DOC;
>> >    }
>> The thing is, often that tie breaking isn't needed (since most Scorers
>> visit docs in order) in the first and by far most frequently executed
>> comparison (compareBottom in Lucene), that tests whether the hit
>> is even competitive.
> Yes, you're right.
> I've changed the implementation to accommodate that point.  HitQueue now tacks
> on the COMPARE_BY_DOC_ID action, but SortCollector doesn't.  SortCollector's
> SI_competitive() test returns false if all comparators tie.

Just be careful if you ever allow out-of-order collection (which for
Lucene at least gives better performance for certain BooleanQuery

>> > For most Scorers, the
>> > cost of calling Scorer_Next() and Scorer_Score(), plus the overhead of calling
>> > SortCollector_Collect() itself, will swamp those savings.
>> I find this logic dangerous and somewhat "self fulfilling".  To say
>> "it's in the noise so we won't bother saving it" eventually leads over
>> time to more and more noise, thus perpetuating that reasoning.  Savings
>> are savings.  Think of how much electricity you're about to spend...
> Why spend time on stuff that doesn't matter?

In fact I think they do matter. I guess that's our difference.  You
call the extra ops "noise", but I call them "signal".  They are in the
absolute hotspots of searching.

If I thought they didn't matter, I too wouldn't want to spend time.

> An opposing philosophy, popular in dynamic language development communities,
> prioritizes developer time over CPU time:
>  Step 1: Write code that's correct.
>  Step 2: If it's not fast enough, profile it to identify trouble spots
>          and work on the worst offenders first.  Stop optimizing as soon as
>          the code is fast enough.
> I don't think that approach is appropriate for systems programming, but it
> highlights an important point: developer time is a finite resource.

Actually, I agree with that logic, and I think it does apply to
"systems" programming.

> We seem to agree that working hard to get the implementation as
> theoretically efficient as possible on the first go-round is important.
> Where we seem to diverge is what happens when there's a tradeoff between
> theoretical efficiency and code simplicity.


> However, since you aren't advocating wholesale porting to
> processor-specific assembler, the difference of opinion is only of
> degree.

I think specialization is almost "have your cake and eat it too".  I
say almost because... the specializer itself is hideously complex (in
my case), currently...

> In my opinion, aggressive unrolling only makes sense for end-of-life code
> bases where no further algorithmic improvements are possible.  Before that,
> it's a distraction.
> This SortCollector/TopDocCollector dichotomy is a good example.  I don't want
> to waste time writing, testing, maintaining, and ultimately deprecating a
> TopDocCollector which saves a few meaningless cycles when we could be focusing
> on how to speed single-term queries with scored collection that works right up
> against mmap'd posting files and eliminates all function-call overhead.
> Until we get a juicy patch like that, I'm content to leave a single
> SortCollector in place, with its reasonably simple and easy-to-grok
> implementation taunting optimization-crazed developers and daring them to do
> better.

I understand your position.

> Compilers are sometimes the only way to solve certain problems, and sometimes
> they are the best solutions -- I certainly believe that's the case with
> Boilerplater and the autogeneration of Lucy bindings.  However, compilers
> introduce a layer of indirection between the programmer and the code that can
> make the implementation difficult to spelunk or debug, so equivalent literal
> code is often preferable.

Agreed, there is a clear cost to specialization.

>> You can push it further: take that bottom value comparison and push
>> it down into your TermDocs iteration as it reads the docs from PFOR.
> I like it!  :)

You can go still further: for searches expected to hit many results
(such that testing bottom becomes a highly restrictive test) you
should use skipping on the those values.  IE, skip straight to the
next doc that has a value that's competitive, then ask Scorer if that
doc passes.


View raw message