Return-Path: X-Original-To: apmail-incubator-lucy-user-archive@www.apache.org Delivered-To: apmail-incubator-lucy-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1BA437BDA for ; Mon, 7 Nov 2011 08:36:51 +0000 (UTC) Received: (qmail 45972 invoked by uid 500); 7 Nov 2011 08:36:51 -0000 Delivered-To: apmail-incubator-lucy-user-archive@incubator.apache.org Received: (qmail 45899 invoked by uid 500); 7 Nov 2011 08:36:50 -0000 Mailing-List: contact lucy-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-user@incubator.apache.org Delivered-To: mailing list lucy-user@incubator.apache.org Received: (qmail 45891 invoked by uid 99); 7 Nov 2011 08:36:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Nov 2011 08:36:50 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gorankent@gmail.com designates 209.85.220.175 as permitted sender) Received: from [209.85.220.175] (HELO mail-vx0-f175.google.com) (209.85.220.175) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Nov 2011 08:36:44 +0000 Received: by vcbfl17 with SMTP id fl17so831181vcb.6 for ; Mon, 07 Nov 2011 00:36:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=Ttce+RJ5WPIPtR6A1NOJ5YA6OduHmRgTCCOPpEnCuV4=; b=F0tfelnSpx5bdUPXFt5zo+rdGWOtLb8KE3GlGJf10qRMjoYalWdn8iK9Mfdi+R2Ss5 LH1w95SwFu4cVU/wIj3EjDcNMUNeAUHU7LkXavFvhpcz2ZH8eSMzvZw2P0Rf5vnw1Pgq 3wyQnOPy1Lg0NM7yn+EvqzC9lmYfxWPFN8EJ4= MIME-Version: 1.0 Received: by 10.52.16.243 with SMTP id j19mr25762850vdd.109.1320654983307; Mon, 07 Nov 2011 00:36:23 -0800 (PST) Received: by 10.52.188.10 with HTTP; Mon, 7 Nov 2011 00:36:23 -0800 (PST) Date: Mon, 7 Nov 2011 10:36:23 +0200 Message-ID: From: goran kent To: lucy-user Content-Type: text/plain; charset=ISO-8859-1 Subject: [lucy-user] Splicing in a bit of caching in remote searcher Hi, I'm considering changing our established caching mechanism to allow for more nimble cache refreshing (ie, when the backend indexes change beyond threshold X). Instead of caching using our reverse-proxy cluster, I'd like to cache the $response on each remote searcher node. My idea is to splice into LucyX/Remote/SearchServer.pm's sub serve(): # Process the method call. read( $client_sock, $buf, 4 ); $len = unpack( 'N', $buf ); read( $client_sock, $buf, $len ); my $response = $dispatch{$method}->( $self, thaw($buf) ); my $frozen = nfreeze($response); my $packed_len = pack( 'N', bytes::length($frozen) ); print $client_sock $packed_len . $frozen; becomes, # Process the method call. read( $client_sock, $buf, 4 ); $len = unpack( 'N', $buf ); read( $client_sock, $buf, $len ); #---------incision start---------- my $response; my $cached_object_id = md5sum($buf); # TODO: check if $buf is the search string if (is_cached($cached_object_id)) { $response = read_cached_object($cached_object_id); } else { $response = $dispatch{$method}->( $self, thaw($buf) ); } #---------incision end---------- my $frozen = nfreeze($response); my $packed_len = pack( 'N', bytes::length($frozen) ); print $client_sock $packed_len . $frozen; .... I seem to recall though that the typical search is not an atomic transaction: ie, the remote search protocol is broken up into discrete request/response chunks: my $hits = $poly_searcher->hits( query => $parsed_query, sort_spec => $sort_spec, offset => 0, # or 10, 20, etc num_wanted => 10, ); is processed roughly as: doc_max/response doc_freq/response x 31 ... top_docs/response fetch_doc/response x 10 ... done So, my question is basically: which parts do I cache and what's the best way to identify those parts? I have a feeling I'm going to have to package a group of request/responses to cache it in it's entirety,... or something. --or maybe this is not feasible within the given framework. I essentially need a better understanding of the client/server interaction process so I can formulate an approach to achieve remote-end caching of search queries (in Perl of course, since that's what's being used here). Comments? thanks