lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From goran kent <>
Subject [lucy-user] Major performance degradation with remote search
Date Fri, 04 Nov 2011 13:09:42 GMT

I'm praying I'm simply missing something because I've just done the
first tests with a subset of our cluster (10 machines) using remote
search and the performance is, well, a deal killer.  Even more so if I
add machines to the cluster, doesn't make sense I know, so read on.

Individually the cluster nodes rip through their indexes within, say,
0.1 - 0.3 secs, when caches are warmed up.

However, wrap everything together and call them all from a single
search machine (ie, to collect results and display), the performance
drops to the sum total of all the remote machine's time.  In other
words, if each search node in the cluster takes .3s to complete (in
parallel), then the total time is 10 (or 20, or 30, depending on how
many machines in the cluster) X .3 = 3s, or 6s, or 9s, etc.

My heart sank when I saw this.  Surely I'm doing something wrong?

I even tried removing the SortSpec, but that had no major improvement.

My code is basically:

my $query_parser = Lucy::Search::QueryParser->new(...default_boolop => 'AND',);
my $parsed_query = $query_parser->parse($query);
foreach my $remote_host (qw(node1 node2...)) {
   push @searcher, LucyX::Remote::SearchClient->new(peer_address
=>..., schema...);
my $poly_searcher = Lucy::Search::PolySearcher->new(schema => $schema,
searchers => \@searcher,);
my $hits = $poly_searcher->hits(
    query      => $parsed_query,
    sort_spec  => $sort_spec,
    offset     => 0,
    num_wanted => 10,
On the remote end its the usual LucyX::Remote::SearchServer code on each node.


View raw message