lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-user] SearchServer / ClusterSearcher - massive performance hit
Date Thu, 25 Oct 2012 15:12:01 GMT
On Thu, Oct 25, 2012 at 1:35 AM, Dag Lem <dag@nimrod.no> wrote:
> Even though ClusterSearcher is implemented in Perl, I don't see that
> the C function top_docs would be calling back into Perl space here,
> and thus I still don't understand the (big) discrepancy.

As a matter of fact, we **are** calling back into Perl-space.  :)

LucyX::Remote::ClusterSearcher is a pure-Perl subclass of
Lucy::Search::Searcher, so it inherits all of Searcher's methods including
hits().  The Perl-space function Lucy::Search::Searcher::hits is an XS wrapper
around the Searcher_hits() C function I pointed you at earlier
(<http://s.apache.org/vH >), which contains this line:

    TopDocs *top_docs = Searcher_Top_Docs(self, real_query, wanted,
                                          sort_spec);

That `Searcher_Top_Docs()` call is actually a **method** invocation -- and
since `self` isa LucyX::Remote::ClusterSearcher, the subroutine that gets
dispatched is a callback to the pure Perl function
LucyX::Remote::ClusterSearcher::top_docs.

How does Lucy know about Perl-space subroutines, and how does the callback
work?  Well, Lucy is built on top of Clownfish, an object toolkit which is
designed to facilitate things like this.  You can write a pure-Perl subclass
of a parent class which is implemented in C and it will Just Work -- which
sure comes in handy for rapid prototyping!

> In any case, just to rule out any *really* crazy stuff, I did the test
> you suggested above. Here, top_docs() was a tiny bit faster than
> hits(), as should be excpected. I have pasted the test program for
> this at the end of this email. I peeked at Searcher.c and Lucy.xs to
> work out the equivalent Perl code for hits(); I hope I got it right.

I gave it a quick look-see, and your code looks like an accurate port --
kudos!

What I was suggesting was something slightly different though:

     if ($top_docs) {
         my $top_docs = $searcher->top_docs(query      => $real_query,
                                            num_wanted => $wanted);
     }
     else {
         $hits = $searcher->hits(
                                 query      => $query,
                                 offset     => $offset,
                                 num_wanted => $num_wanted,
                                 );
     }

I would not expect `hits()` to be faster in this case -- either for an
IndexSearcher or a ClusterSearcher -- because `hits()` calls `top_docs()`
as described above.

Marvin Humphrey

Mime
View raw message