lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Wartes <jwar...@whitepages.com>
Subject RE: Solr load balancer
Date Fri, 01 Feb 2013 01:49:56 GMT

For what it's worth, Google has done some pretty interesting research into coping with the
idea that particular shards might very well be busy doing something else when your query comes
in.

Check out this slide deck: http://research.google.com/people/jeff/latency.html
Lots of interesting ideas, but in particular, around slide 39 he talks about "backup requests"
where you wait for something like your typical response time and then issue a second request
to a different shard. You take whichever answer you get first, and cancel the other. The initial
wait + cancellation means your extra cluster load is minimal, and you still get the benefit
of reducing your p95+ response times if the first request was high-latency due to something
unrelated to the query. (Say, GC.)

Of course, a central principle of this approach is being able to cancel a query and have it
stop consuming resources. I'd love to be corrected, but I don't think Solr allows this. You
can stop waiting for a response, but even the timeAllowed param doesn't seem to stop resource
usage after the allotted time.  Meaning, a few exceptionally long-running queries can take
out your high-throughput cluster by tying up entire CPUs for long periods.

Let me know the JIRA number, I'd love to see work in this area.


-----Original Message-----
From: Phil Hoy [mailto:phoy@brightsolid.com] 
Sent: Tuesday, January 29, 2013 11:33 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr load balancer

Hi Erick,

Thanks, I have read the blogs you cited and I found them very interesting, and we have tuned
the jvm accordingly but still we get the odd longish gc pause. 

That said we perhaps have an unusual setup; we index a lot of small documents using servers
with ssd's and 128 GB RAM in a sharded set up with replicas and our queries rely heavily on
query filters and faceting with minimal free-text style searching. For that reason we rely
heavily on the filter cache to improve query latency, therefore we assign a large percentage
of available ram to the jvm hosting solr. 

Anyhow we are happy with the current configuration and performance profile, aside from the
odd gc pause that is, and as we have index replicas it seems to me that we should be able
to cope, hence my willingness to tweak how the load balancer behaves.

Thanks,
Phil



-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: 20 January 2013 15:56
To: solr-user@lucene.apache.org
Subject: Re: Solr load balancer

Hmmm, the first thing I'd look at is why you are having long GC pauses. Here's a great place
to start:

http://www.lucidimagination.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/
and:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

I've wondered about a similar approach, but by firing off the same query to multiple nodes
in your cluster, you'll be effectively doubling (at least) the load on your system. Leading
to more memory issues perhaps in a "non-virtuous cycle".

FWIW,
Erick

On Fri, Jan 18, 2013 at 5:41 AM, Phil Hoy <phoy@brightsolid.com> wrote:
> Hi,
>
> I would like to experiment with some custom load balancers to help with query latency
in the face of long gc pauses and the odd time-consuming query that we need to be able to
support. At the moment setting the socket timeout via the HttpShardHandlerFactory does help,
but of course it can only be set to a length of time as long as the most time consuming query
we are likely to receive.
>
> For example perhaps a load balancer that sends multiple queries concurrently to all/some
replicas and only keeps the first response might be effective. Or maybe a load balancer which
takes account of the frequency of timeouts would be able to recognize zombies more effectively.
>
> To use alternative load balancer implementations cleanly and without having to hack solr
directly, I would need to be able to make the existing LBHttpSolrServer and HttpShardHandlerFactory
more amenable to extension, I can then override the default load balancer using solr's plugin
mechanism.
>
> So my question is, if I made a patch to make the load balancer more pluggable, is this
something that would be acceptable and if so what do I do next?
>
> Phil
>
> ______________________________________________________________________
> "brightsolid" is used in this email to collectively mean brightsolid online innovation
limited and its subsidiary companies brightsolid online publishing limited and brightsolid
online technology limited.
> findmypast.co.uk is a brand of brightsolid online publishing limited.
> brightsolid online innovation limited, Gateway House, Luna Place, Dundee Technology Park,
Dundee DD2 1TP.  Registered in Scotland No. SC274983.
> brightsolid online publishing limited, The Glebe, 6 Chapel Place, Rivington Street, London
EC2A 3DQ. Registered in England No. 04369607.
> brightsolid online technology limited, Gateway House, Luna Place, Dundee Technology Park,
Dundee DD2 1TP.  Registered in Scotland No. SC161678.
>
> Email Disclaimer
>
> This message is confidential and may contain privileged information. You should not disclose
its contents to any other person. If you are not the intended recipient, please notify the
sender named above immediately. It is expressly declared that this e-mail does not constitute
nor form part of a contract or unilateral obligation. Opinions, conclusions and other information
in this message that do not relate to the official business of brightsolid shall be understood
as neither given nor endorsed by it.
> ______________________________________________________________________
> This email has been scanned by the brightsolid Email Security System. 
> Powered by MessageLabs
> ______________________________________________________________________


Mime
View raw message