incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: SortCollector
Date Fri, 01 May 2009 21:45:26 GMT
On Fri, May 1, 2009 at 11:48 AM, Marvin Humphrey <> wrote:

> In LUCENE-1593, a further doubling of the class count in TopFieldCollector to
> 12 is being contemplated

Those 12 classes are private and each is tiny.

Ideally, in the future, I would love to always favor code simplicity
(as you generally do with Lucy) in Lucene's core and then rely on
source code specialization (LUCENE-1594) to write the
ugly-but-super-fast code.

But we don't have mature specialization today, and so lacking that
we're choosing to do some of it (splintering into N classes) in the core.

> SortCollector will have have two main components: a priority queue, and an
> array of HitComparator objects.  It will behave similarly to the last class in
> TopFieldColector, MultiComparatorScoringNoMaxScoreCollector.  Determining
> whether a document is "competitive" and should be added to the inner queue
> involves iterating over the comparators until one of them returns a non-zero
> comparison.  Something like this:
>    u32_t i = 0;
>    i32_t comparison;
>    u8_t *const comparator_types = self->comparator_types;
>    const u32_t num_comparators  = self->num_comparators;
>    HitComparator **const comparators = self->comparators;
>    do {
>        switch (comparator_types[i]) {
>            case HitComp_BY_SCORE:
>                comparison = SI_compare_by_score(a, b);
>                break;
>            case HitComp_BY_SCORE_REV:
>                comparison = SI_compare_by_score(b, a);
>                break;
>            case HitComp_BY_DOC:
>                comparison = SI_compare_by_doc(a, b);
>                break;
>            case HitComp_BY_DOC_REV:
>                comparison = SI_compare_by_doc(b, a);
>                break;
>            case HitComp_BY_ORD2:
>                comparison = SI_compare_by_ord2(comparators[i], a, b);
>                break;
>            case HitComp_BY_ORD2_REV:
>                comparison = SI_compare_by_ord2(comparators[i], b, a);
>                break;
>            case HitComp_BY_ORD4:
>                comparison = SI_compare_by_ord4(comparators[i], a, b);
>                break;
>            case HitComp_BY_ORD4_REV:
>                comparison = SI_compare_by_ord4(comparators[i], b, a);
>                break;
>            case HitComp_BY_ORD8:
>                comparison = SI_compare_by_ord8(comparators[i], a, b);
>                break;
>            case HitComp_BY_ORD8_REV:
>                comparison = SI_compare_by_ord8(comparators[i], b, a);
>                break;
>            case HitComp_BY_ORD16:
>                comparison = SI_compare_by_ord16(comparators[i], a, b);
>                break;
>            case HitComp_BY_ORD16_REV:
>                comparison = SI_compare_by_ord16(comparators[i], b, a);
>                break;
>            case HitComp_BY_ORD32:
>                comparison = SI_compare_by_ord32(comparators[i], a, b);
>                break;
>            case HitComp_BY_ORD32_REV:
>                comparison = SI_compare_by_ord32(comparators[i], b, a);
>                break;
>            default:
>                comparison = 0;
>                THROW("UNEXPECTED comparator type %u8", comparator_types[i]);
>        }
>    } while (comparison == 0 && ++i < num_comparators);

Float/Double will have to compute separate ords (at indexing time)?
(Lucene just uses the float/double values themselves).

Do you plan on allowing arbitrary bit-sized ords, not just
2/4/8/16/32?  (We've discussed that).

Requiring custom comparators to compile down to ords at indexing time
is possibly to restrictive?

> This algo has several advantages over the current Lucene implementation.
> First off, tracking "max score" is a Lucene legacy requirement, so Lucy
> doesn't need it.

That's nice. That would've brought us back down to 6 ;)

> Second, Lucy pushes the bulk of sort costs back into index time.  All sortable
> fields are required to provide an mmap'd array of ords.  Even if we add new
> binary FieldTypes like Int32FieldType or add support for custom sort
> comparators, we will require the new sort types to provide ord arrays.

How do you handle string fields?  You're working per-segment right?

> Third, the Lucene code makes method calls for each comparison, but for ord
> comparisons, we can use static inline functions which resolve to array lookups.
> (Coding convention: I use the prefix "SI_" for all static inline functions, and
> the prefix "S_" for static functions -- so SI_compare_by_ord2() and friends are
> static inline.)

But you're trading off a switch statement instead... and since you're
not specializing, you're eg making the common case (sort by one field)
pay the looping overhead of the uncommon case.

> Fourth, we might get a small gain by factoring the "reverse" argument to
> SortSpec into the HitComparator "type", which is a u8_t.  So long as we limit
> the total number of branches in the switch() statement to less than 256, it
> should compile to a jump table, so reversing sort order will be cost-free:
>    case HitComp_BY_ORD2:
>        comparison = SI_compare_by_ord2(comparators[i], a, b);
>        break;
>    case HitComp_BY_ORD2_REV:
>        comparison = SI_compare_by_ord2(comparators[i], b, a);
>        break;
> Lucene performs a multiply instead:
>    final int c = reverseMul[i] * comparators[i].compareBottom(doc);
> Multiply ops are cheap, though, so I'd be surprised if that turned out to make
> a measurable difference.

Nice.  Maybe we could double our comparators with this approach ;)
(In LUCENE-1594 I plan on doing the same thing.)

> I found one additional inefficiency in the Lucene implementation: score() is
> called twice for "competitive" docs.  (This isn't true for
> TopScoreDocCollector, which is the main class for collecting scored docs --
> it's only true for the TopFieldDoc implementations that also track score). It
> would be nice to avoid that, but it probably doesn't amount to much.

Please give details on where that's happening... we tried not to do

> For ordinary searches, sort order is determined by score, with doc num
> breaking ties.  This behavior can be achieved using SortCollector like so:
>    if (!sort_spec) {
>        self->comparator_types[0] = HitComp_BY_SCORE;
>        self->comparator_types[1] = HitComp_BY_DOC;
>    }

The thing is, often that tie breaking isn't needed (since most Scorers
visit docs in order) in the first and by far most frequently executed
comparison (compareBottom in Lucene), that tests whether the hit
is even competitive.

> The only difference between such a SortCollector and TopDocCollector is that
> in TopDocCollector, the comparison is hard-wired and you save 2-4 pointer
> dereferences, 2-4 array lookups, and 1-2 switch() ops.  For most Scorers, the
> cost of calling Scorer_Next() and Scorer_Score(), plus the overhead of calling
> SortCollector_Collect() itself, will swamp those savings.

I find this logic dangerous and somewhat "self fulfilling".  To say
"it's in the noise so we won't bother saving it" eventually leads over
time to more and more noise, thus perpetuating that reasoning.  Savings
are savings.  Think of how much electricity you're about to spend...

> However, I question whether that should be a priority.  If we really want to
> max out performance for simple queries, we need to avoid eliminate method
> calls altogether, get right next to mmap'd, PFOR-encoded posting data, and
> collect hits into a purpose-built priority queue operating with mostly static
> inline functions.  A "Matcher_TopDocs()" method, perhaps.
> Speaking of purpose-built priority queues...
> It might also make sense to move the priority queue code which powers HitQueue
> into SortCollector.  We can then hard-code it for dealing with our particular
> sort algo, so that PriQ_Less_Than() becomes a static inline function.
> I don't expect much of a speedup, if any, from inlining HitQueue within
> SortCollector.  However, doing things that way allows us to avoid duplicating
> the above comparison code which iterates over the array of HitComparator
> objects.
> Instead, we duplicate the priority queue code -- but since the heap sort algo
> is decades old, the small amount of code required to implement it is likely to
> stay stable.

This sounds like a baby step towards source code
specialization... you're conflating matching & collection (maybe only
for TermQuery?).

You can push it further: take that bottom value comparison and push
it down into your TermDocs iteration as it reads the docs from PFOR.


View raw message