lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-user] SearchServer / ClusterSearcher - massive performance hit
Date Mon, 22 Oct 2012 19:18:55 GMT
On Mon, Oct 22, 2012 at 8:12 AM, Dag Lem <dag@nimrod.no> wrote:

> I've started playing around a bit with Lucy, and I have to say it's
> really, really nice!

:)

> However I've run into a problem trying to increase performance using
> sharding with SearchServer / ClusterSearcher. In my tests, I get close
> to a tenfold *drop* in performance using a few local shards (say, 3
> shards on a 4 core server).

"Performance" can mean many things.  By design, ClusterSearcher is going to
degrade metrics in some areas relative to a single index, while improving
in others.

For instance, it is perfectly acceptable if queries which return a single hit
are slower under ClusterSearcher than under a single local IndexSearcher.
Single-hit queries tend to be dominated by per-query overhead rather than
per-hit search processing, and the per-query cost of ClusterSearcher is much
higher than that of IndexSearcher.  A tenfold degradation is not
inconcievable.

If the search profile of an application is dominated by such small lookup
queries -- for instance, if you are using Lucy as a key-value store -- it
would be best to avoid ClusterSearcher until you absolutely have to use it.
Instead, you would want to invest in either RAM or SSDs.

ClusterSearcher is intended for a different search query profile, though: it
is optimized for large, computationally expensive queries which are dominated
by per-hit search processing and potentially return many hits.

> While I would expect some overhead using SearchServer / ClusterSearcher,
> the close to tenfold increase in search time I experience does seem
> rather excessive. I'd need an exorbitant amount of shards just to get
> the same performance as by using a single index, if I'd ever get there...

Is your search query dominated by per-query or per-hit costs -- i.e. does it
return quickly at the level of a single IndexSearcher?

If the costs are mostly per-query, then degradation in ClusterSearcher is
to be expected and arguably less of a concern.  (Theoretically, we might look
into things like changing how we do object serialization if we wanted to
improve matters.)

If the query is expensive to begin with, though -- because it is dominated by
per-hit costs -- then it would be unexpected to see ClusterSearcher perform
poorly, and we would want to find out why.

> If there is anything I can do to help isolate any possible problem,
> please do tell me so (e.g. strace / perl profiling / ...)

We're not there yet.  If we see expensive queries take longer in
ClusterSearcher, I think some Perl profiling might help.  If, however, only
cheap queries are slower, then we'd want to focus on optimizing your
application first.

Marvin Humphrey

Mime
View raw message