lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Solovyev <g...@zimbra.com>
Subject Re: CloudSolrServer, concurrency and too many connections
Date Wed, 10 Dec 2014 19:05:31 GMT
I am seeing the same problem with 4.10.2 and 4.9.0. CloudSolrServer keeps opening connections
to ZK and never closes them. Eventually (very soon) ZK runs out of connections and stops accepting
new ones. 

Thanks,
Greg

----- Original Message -----
From: "JoeSmith" <fidwork@gmail.com>
To: "solr-user" <solr-user@lucene.apache.org>
Sent: Sunday, December 7, 2014 8:11:50 PM
Subject: Re: CloudSolrServer, concurrency and too many connections

i've upgraded to 4.10.2 on the client-side.  Still seeing this connection
problem when connecting to the Zookeeper port.  If I connect directly to
SolrServer, the connections do not increase.  But when connecting to
Zookeeper, the connections increase up to 60 and then start to fail.  I
understand Zookeeper is configured to fail after 60 connections to prevent
a DOS attack, but I dont see why we keep adding new connections (up to
60).  Does the client-side Zookeeper code also use HttpClient
ConnectionPooling for its Connection Pool?  Below is the Exception that
shows up in the log file when this happens.  When we execute queries we are
using the _route_ parameter, could this explain anything?

o.a.zookeeper.ClientCnxn - Session 0x0 for server
aweqca3utmtc10.cloud.xxxx.com/10.22.10.107:9983, unexpected error, closing
socket connection and attempting reconnect

java.io.IOException: Connection reset by peer

        at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.7.0_55]

        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
~[na:1.7.0_55]

        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
~[na:1.7.0_55]

        at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.7.0_55]

        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
~[na:1.7.0_55]

        at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
~[zookeeper-3.4.6.jar:3.4.6-1569965]

        at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
~[zookeeper-3.4.6.jar:3.4.6-1569965]

        at
org.apache.zookeeper.Clie4.ntCnxn$SendThread.run(ClientCnxn.java:1081)
~[zookeeper-3.4.6.jar:3.4.6-1569965]


Will try to get the server code upgraded to 4.10.2.



On Sat, Dec 6, 2014 at 3:52 PM, Shawn Heisey <apache@elyograg.org> wrote:

> On 12/6/2014 12:09 PM, JoeSmith wrote:
> > We are currently using CloudSolrServer, but it looks like this class is
> not
> > thread-safe (setDefaultCollection). Should this instance be initialized
> > once (at startup) and then re-used (in all threads) until shutdown when
> the
> > process terminates?  Or should it re-instantiated for each request?
> >
> > Currently, we are trying to use CloudSolrServer as a singleton, but it
> > looks like the connections to the host are not being closed and under
> load
> > we start getting failures.  and In the Zookeeper logs we see this error:
> >
> >> WARN  - 2014-12-04 10:09:14.364;
> >> org.apache.zookeeper.server.NIOServerCnxnFactory; Too many connections
> from
> >> /11.22.33.44 - max is 60
> >
> > netstat (on the Zookeeper host) shows that the connections are not being
> > closed. What is the 'correct' way to fix this?   Apologies if i have
> missed
> > any documentation that explains, pointers would be helpful.
>
> All SolrServer implementations in SolrJ, including CloudSolrServer, are
> supposed to be threadsafe.  If it turns out they're not actually
> threadsafe, then we treat that as a bug.  The discussion to determine
> that it's a bug takes place on this mailing list, and once we determine
> that, the next step is to file an issue in Jira.
>
> The general way to use SolrJ is to initialize the server instance at the
> beginning and re-use it for all client communication to Solr.  With
> CloudSolrServer, you normally only need a single server instance to talk
> to the entire cloud, because you can set the "collection" parameter on
> each request to indicate which collection to work on.  If you only have
> a handful of collections, you might want to use multiple instances and
> use setDefaultCollection  to specify the collection.  With
> HttpSolrServer, an instance is required for each core, because the core
> name is in the initialization URL.
>
> I've not looked at the code, but I can't imagine that the client ever
> needs to make more than one connection to each server in the zookeeper
> ensemble.  Here's a list of the open connections on one of my zookeeper
> servers for my SolrCloud 4.2.1 install:
>
> java    21800 root   21u  IPv6            2836983      0t0      TCP
> 10.8.0.151:50178->10.8.0.152:2888 (ESTABLISHED)
> java    21800 root   22u  IPv6            2661097      0t0      TCP
> 10.8.0.151:3888->10.8.0.152:34116 (ESTABLISHED)
> java    21800 root   26u  IPv6           28065088      0t0      TCP
> 10.8.0.151:2181->10.8.0.141:52583 (ESTABLISHED)
> java    21800 root   27u  IPv6           23967470      0t0      TCP
> 10.8.0.151:2181->10.8.0.152:49436 (ESTABLISHED)
> java    21800 root   28r  IPv6           23969636      0t0      TCP
> 10.8.0.151:2181->10.8.0.151:57290 (ESTABLISHED)
> java    21800 root   29r  IPv6           23969951      0t0      TCP
> 10.8.0.151:3888->10.8.0.153:54721 (ESTABLISHED)
>
> The 151, 152, and 153 addresses are my ZK servers, with Solr also
> running on 151 and 152.  The 141 address is the SolrJ client.  The main
> ZK port is 2181, with ports 2888 and 3888 used for internal zookeeper
> communication.  I actually would have expected to see two client
> connections from .141 ... one for the indexer program and one for the
> webapp.  They haven't reported a Solr problem to me, so I guess it must
> be OK.
>
> If your install is re-establishing connections and not closing the old
> ones, then there is either something wrong with your setup or a bug.
> Because there are not a large number of people with the same complaint,
> I would lean more towards problems in your setup.  I won't rule out the
> possibility that there's a bug, because we've had a lot of them.
>
> One thing to try immediately is upgrading to 4.10.2 ... there have been
> two bugfix releases since the version you're running came out, with 16
> bug issues closed.  None of those issues sounds like what you're running
> into, but sometimes when mistakes are noticed in the code, fixing them
> can make other seemingly unrelated problems go away.  Upgrading to a
> bugfix release on the same minor version should be a drop-in replacement
> with no configuration changes necessary.
>
> http://lucene.apache.org/solr/4_10_2/changes/Changes.html
>
> Beyond that, we need more information.  Are there ERROR or WARN messages
> in your Solr log and/or your SolrJ client log that don't come from bad
> queries?  If there are, it may indicate some kind of problem, especially
> if they relate to the zk client timeout.  Problems like that can be
> caused by general performance issues, including garbage collection pauses.
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> Depending on what is found in your log, other questions about your setup
> may need answsering.
>
> Thanks,
> Shawn
>
>

Mime
View raw message