lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: [lucy-user] Concurrent searching
Date Wed, 23 Nov 2011 12:41:11 GMT
On Wed, Nov 23, 2011 at 01:25:09PM +0200, goran kent wrote:
> Something is weird with the length for the top_docs packet.
> In SearchServer::serve, ~line 106, the confess is chucking a null
> error because $check_val != $len, hence the meaningless error:
> " at /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/LucyX/Remote/
> line 106"
> In ClusterSearcher::_serialize_request for top_docs
> length($serialized)==6959, but SearchServer::serve is receiving
> length==2892.
> So, that's why SearchServer is failing.  What's causing the short send
> (or receive, or pack/unpack not co-operating across machines) will
> hopefully soon be revealed.

As we move away from blocking i/o, we need to manage buffers manually and be
prepared for partial success.  (Eventually we need to deal with timeouts and
failovers, because otherwise the system remains vulnerable to its weakest
link and hangs when a single node goes down -- but that's for later.)
> Suggested patch:
> - confess $! unless $check_val == $len;
> + confess "packet length mismatch: $!" unless $check_val == $len;

Those confess() calls are placeholders, to be swapped out at some future
time with a less aggressive error reporting mechanism that does not take down
the server process.  The idea was to use confess() during early rapid
prototyping to flag each place a system call return value needs to be checked.

In some cases, including here, the code also needs to be refactored around
non-blocking i/o.  What we ultimately need to do is accept a partial read,
store the incomplete buffer, and return to waiting for the next ready socket.
The code will become more complicated because we'll have to keep multiple
buffers alive, but that's concurrency for ya.

For now though, try this:

   * Change every sysread() to read(), and every syswrite() to write().
   * Set $socket->autoflush(1);
   * Make sure 'Blocking => 0' is commented out.
   * Replace the select() loop with a "for" loop, because select() and
     blocking i/o don't mix.

What I'm hoping to do with those changes is return to forcing every socket
communication to block, restoring predictable program execution order.

Marvin Humphrey

View raw message