incubator-lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From goran kent <gorank...@gmail.com>
Subject [lucy-user] Splicing in a bit of caching in remote searcher
Date Mon, 07 Nov 2011 08:36:23 GMT
Hi,

I'm considering changing our established caching mechanism to allow
for more nimble cache refreshing (ie, when the backend indexes change
beyond threshold X).  Instead of caching using our reverse-proxy
cluster, I'd like to cache the $response on each remote searcher node.

My idea is to splice into LucyX/Remote/SearchServer.pm's sub serve():

# Process the method call.
read( $client_sock, $buf, 4 );
$len = unpack( 'N', $buf );
read( $client_sock, $buf, $len );
my $response   = $dispatch{$method}->( $self, thaw($buf) );
my $frozen     = nfreeze($response);
my $packed_len = pack( 'N', bytes::length($frozen) );
print $client_sock $packed_len . $frozen;


becomes,

# Process the method call.
read( $client_sock, $buf, 4 );
$len = unpack( 'N', $buf );
read( $client_sock, $buf, $len );
#---------incision start----------
my $response;
my $cached_object_id = md5sum($buf); # TODO: check if $buf is the search string

if (is_cached($cached_object_id)) {
    $response = read_cached_object($cached_object_id);
}
else {
    $response   = $dispatch{$method}->( $self, thaw($buf) );
}
#---------incision end----------
my $frozen     = nfreeze($response);
my $packed_len = pack( 'N', bytes::length($frozen) );
print $client_sock $packed_len . $frozen;

....

I seem to recall though that the typical search is not an atomic
transaction:  ie, the remote search protocol is broken up into
discrete request/response chunks:


my $hits = $poly_searcher->hits(
    query      => $parsed_query,
    sort_spec  => $sort_spec,
    offset     => 0,  # or 10, 20, etc
    num_wanted => 10,
);


is processed roughly as:

doc_max/response
doc_freq/response x 31
...
top_docs/response
fetch_doc/response x 10
...
done

So, my question is basically:  which parts do I cache and what's the
best way to identify those parts?  I have a feeling I'm going to have
to package a group of request/responses to cache it in it's
entirety,... or something.   --or maybe this is not feasible within
the given framework.

I essentially need a better understanding of the client/server
interaction process so I can formulate an approach to achieve
remote-end caching of search queries (in Perl of course, since that's
what's being used here).


Comments?

thanks

Mime
View raw message