lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Kurz <>
Subject Re: [lucy-user] Collapsing and highlighting
Date Mon, 12 Dec 2011 08:11:53 GMT
On Sat, Dec 10, 2011 at 8:24 PM, lance bowler <> wrote:
> I had a gander at the ClusterSearcher code and it indeed only accepts
> 1 search request at a time - unless my perl grokness is failing me.

You're likely understanding it correctly.  The out-of-the-box Lucy
support for clusters is limited, but the design is such that you
should be able to customize it for your own needs quite easily.  For
anything beyond a single machine, Lucy is more of a high-performance
toolkit than a ready-to-go solution.

> At a basic level, a server should accept inbound connections and
> fork/thread off and handle each request concurrently (a-la xinetd,
> etc).

This isn't a bad approach, and depending on your needs xinetd directly
might even be possible.  You'd probably be better starting a pool of
servers (prefork) and running something over them to restart them if
necessary.  2 processes per core would be a good starting point, then
test to see how performance changes with higher or lower numbers.

> Even if each request takes only 0.2s to complete, 10 such
> requests (or 50...) would rapidly push that number up to 2s (or
> 10s...)   -- or am I not missing the mark here?

This is where it starts to get tricky.  What sort of index are you
envisioning for what sort of searches?  Are you ever going to be
hitting disk, or are you always in RAM? Your estimated 0.2s might be
high or low by a lot, which would affect your approach.  You're not
missing the mark, but you probably need to do some preliminary testing
before deciding how to proceed.

> We have spikes of traffic and concurrent searching would leave tens -
> or hundreds - of users staring at a rotating wheel while the search
> client waits it's turn...  bad

Unfortunately, running a pool of servers might not make this much
better.  Unlike a web server, which is mostly waiting to push files
over slow internet links, search is resource intensive.  Depending on
your usage and hardware, you may be limited by Disk IO, processor, and
Memory IO.   More processes only helps if you have processor to spare.

Nathan Kurz

View raw message