Return-Path: Delivered-To: apmail-lucene-lucy-dev-archive@minotaur.apache.org Received: (qmail 24413 invoked from network); 1 May 2009 15:49:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 May 2009 15:49:25 -0000 Received: (qmail 60851 invoked by uid 500); 1 May 2009 15:49:25 -0000 Delivered-To: apmail-lucene-lucy-dev-archive@lucene.apache.org Received: (qmail 60798 invoked by uid 500); 1 May 2009 15:49:25 -0000 Mailing-List: contact lucy-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-dev@lucene.apache.org Delivered-To: mailing list lucy-dev@lucene.apache.org Received: (qmail 60788 invoked by uid 99); 1 May 2009 15:49:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 May 2009 15:49:25 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [68.116.38.202] (HELO rectangular.com) (68.116.38.202) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 May 2009 15:49:19 +0000 Received: from marvin by rectangular.com with local (Exim 4.63) (envelope-from ) id 1Lzuyv-0007Sm-MJ for lucy-dev@lucene.apache.org; Fri, 01 May 2009 08:48:57 -0700 Date: Fri, 1 May 2009 08:48:57 -0700 To: lucy-dev@lucene.apache.org Subject: SortCollector Message-ID: <20090501154857.GA28441@rectangular.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.13 (2006-08-11) From: Marvin Humphrey X-Virus-Checked: Checked by ClamAV on apache.org As I've been working up segment-centric sorted search for Lucy, I've been spelunking the Lucene files TopScoreDocCollector.java and TopFieldCollector.java. I've also been following the JIRA issue LUCENE-1593 "Optimizations to TopScoreDocCollector and TopFieldCollector" (though the discussion there has gotten very long and drifted off-topic, so I don't necessarily recommend it). It's hard to wrap your head around the code in TopScoreDocCollector.java and TopFieldCollector.java until you realize that there's much less there than meets the eye -- those two files are an egregious violation against the principle of DRY, with lots of copy-and-paste code. TopFieldCollector.java contains 6 classes which are nearly the same. * OneComparatorNonScoringCollector * OneComparatorScoringNoMaxScoreCollector * OneComparatorScoringMaxScoreCollector * MultiComparatorNonScoringCollector * MultiComparatorScoringMaxScoreCollector * MultiComparatorScoringNoMaxScoreCollector This has been done because the collect() method in Collector subclasses is used in tight loops, and what could have been a single class has been unrolled for the sake of shaving CPU cycles on each loop iter. In LUCENE-1593, a further doubling of the class count in TopFieldCollector to 12 is being contemplated, bringing the total number of near-identical classes to 13 when you include TopScoreDocCollector. I hope we can achieve the same ends in Lucy with at most 2 classes: "SortCollector" and "TopDocCollector". Maybe even just SortCollector. SortCollector will have have two main components: a priority queue, and an array of HitComparator objects. It will behave similarly to the last class in TopFieldColector, MultiComparatorScoringNoMaxScoreCollector. Determining whether a document is "competitive" and should be added to the inner queue involves iterating over the comparators until one of them returns a non-zero comparison. Something like this: u32_t i = 0; i32_t comparison; u8_t *const comparator_types = self->comparator_types; const u32_t num_comparators = self->num_comparators; HitComparator **const comparators = self->comparators; do { switch (comparator_types[i]) { case HitComp_BY_SCORE: comparison = SI_compare_by_score(a, b); break; case HitComp_BY_SCORE_REV: comparison = SI_compare_by_score(b, a); break; case HitComp_BY_DOC: comparison = SI_compare_by_doc(a, b); break; case HitComp_BY_DOC_REV: comparison = SI_compare_by_doc(b, a); break; case HitComp_BY_ORD2: comparison = SI_compare_by_ord2(comparators[i], a, b); break; case HitComp_BY_ORD2_REV: comparison = SI_compare_by_ord2(comparators[i], b, a); break; case HitComp_BY_ORD4: comparison = SI_compare_by_ord4(comparators[i], a, b); break; case HitComp_BY_ORD4_REV: comparison = SI_compare_by_ord4(comparators[i], b, a); break; case HitComp_BY_ORD8: comparison = SI_compare_by_ord8(comparators[i], a, b); break; case HitComp_BY_ORD8_REV: comparison = SI_compare_by_ord8(comparators[i], b, a); break; case HitComp_BY_ORD16: comparison = SI_compare_by_ord16(comparators[i], a, b); break; case HitComp_BY_ORD16_REV: comparison = SI_compare_by_ord16(comparators[i], b, a); break; case HitComp_BY_ORD32: comparison = SI_compare_by_ord32(comparators[i], a, b); break; case HitComp_BY_ORD32_REV: comparison = SI_compare_by_ord32(comparators[i], b, a); break; default: comparison = 0; THROW("UNEXPECTED comparator type %u8", comparator_types[i]); } } while (comparison == 0 && ++i < num_comparators); This algo has several advantages over the current Lucene implementation. First off, tracking "max score" is a Lucene legacy requirement, so Lucy doesn't need it. Second, Lucy pushes the bulk of sort costs back into index time. All sortable fields are required to provide an mmap'd array of ords. Even if we add new binary FieldTypes like Int32FieldType or add support for custom sort comparators, we will require the new sort types to provide ord arrays. Third, the Lucene code makes method calls for each comparison, but for ord comparisons, we can use static inline functions which resolve to array lookups. (Coding convention: I use the prefix "SI_" for all static inline functions, and the prefix "S_" for static functions -- so SI_compare_by_ord2() and friends are static inline.) Fourth, we might get a small gain by factoring the "reverse" argument to SortSpec into the HitComparator "type", which is a u8_t. So long as we limit the total number of branches in the switch() statement to less than 256, it should compile to a jump table, so reversing sort order will be cost-free: case HitComp_BY_ORD2: comparison = SI_compare_by_ord2(comparators[i], a, b); break; case HitComp_BY_ORD2_REV: comparison = SI_compare_by_ord2(comparators[i], b, a); break; Lucene performs a multiply instead: final int c = reverseMul[i] * comparators[i].compareBottom(doc); Multiply ops are cheap, though, so I'd be surprised if that turned out to make a measurable difference. I found one additional inefficiency in the Lucene implementation: score() is called twice for "competitive" docs. (This isn't true for TopScoreDocCollector, which is the main class for collecting scored docs -- it's only true for the TopFieldDoc implementations that also track score). It would be nice to avoid that, but it probably doesn't amount to much. For ordinary searches, sort order is determined by score, with doc num breaking ties. This behavior can be achieved using SortCollector like so: if (!sort_spec) { self->comparator_types[0] = HitComp_BY_SCORE; self->comparator_types[1] = HitComp_BY_DOC; } The only difference between such a SortCollector and TopDocCollector is that in TopDocCollector, the comparison is hard-wired and you save 2-4 pointer dereferences, 2-4 array lookups, and 1-2 switch() ops. For most Scorers, the cost of calling Scorer_Next() and Scorer_Score(), plus the overhead of calling SortCollector_Collect() itself, will swamp those savings. However, a pure TermScorer, which operates its own custom collection loop, might in theory show a measurable difference, theoretically justifying the addition of TopDocCollector pending the outcome of benchmarking. However, I question whether that should be a priority. If we really want to max out performance for simple queries, we need to avoid eliminate method calls altogether, get right next to mmap'd, PFOR-encoded posting data, and collect hits into a purpose-built priority queue operating with mostly static inline functions. A "Matcher_TopDocs()" method, perhaps. Speaking of purpose-built priority queues... It might also make sense to move the priority queue code which powers HitQueue into SortCollector. We can then hard-code it for dealing with our particular sort algo, so that PriQ_Less_Than() becomes a static inline function. I don't expect much of a speedup, if any, from inlining HitQueue within SortCollector. However, doing things that way allows us to avoid duplicating the above comparison code which iterates over the array of HitComparator objects. Instead, we duplicate the priority queue code -- but since the heap sort algo is decades old, the small amount of code required to implement it is likely to stay stable. Marvin Humphrey