incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: All Connections Are Bad...
Date Sun, 11 Dec 2016 15:07:47 GMT
I believe this timer does in fact test the pooled client connections.  I my
experience the all connections bad exception usually occurs when a shard
server is no responding in a timely manor.  It could be GCing or blocking
from HDFS or some other unknown problem.

Timer:

https://github.com/apache/incubator-blur/blob/master/blur-thrift/src/main/java/org/apache/blur/thrift/ClientPool.java#L98

Also there is a test method that will test connections before their use.

https://github.com/apache/incubator-blur/blob/master/blur-thrift/src/main/java/org/apache/blur/thrift/ClientPool.java#L299

Hope this helps.

Aaron



On Sat, Dec 10, 2016 at 5:56 AM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> Just now tried to understand the logic...
>
> Whenever an IOException/TTransportException is thrown, we mark a
> Connection
> as bad. Slowly when all Connections are greeted by this, we get "All
> Connections Bad..."
>
> Is it a good idea to write a reaper thread to proactively try & replenish
> the bad Connection, instead of waiting for search to hit it at the wrong
> moment?
>
> Also, I just found that "staleness" check is eagerly performed. It should
> be possible to return a live connection & refresh stale ones in background?
> [*ClientPool.getConnection(Connection conn)*]
>
> --
> Ravi
>
>
>
> On Sat, Dec 10, 2016 at 3:44 PM, Ravikumar Govindarajan <
> ravikumar.govindarajan@gmail.com> wrote:
>
> > Often, I find myself bang in the middle of a query, when
> BlurClientManager
> > comes up with this error. Happens both ways. When my app-server talks to
> > controller-server as well as controller-server talks to shard-server.
> This
> > is affecting search experience quite a bit nowadays in production!!
> >
> > BlurException(message:Unknown error during remote call to node
> > [AAA.BB.CCC.DD:40020], stackTraceStr:org.apache.blur.
> thrift.BadConnectionException:
> > Could not connect to controller/shard server. All connections are bad. at
> > org.apache.blur.thrift.BlurClientManager.execute(
> BlurClientManager.java:243)
> > at org.apache.blur.thrift.BlurClientManager.execute(
> BlurClientManager.java:314)
> > at org.apache.blur.thrift.BlurControllerServer$BlurClientRemote$1.call(
> BlurControllerServer.java:132)
> > at org.apache.blur.thrift.BlurControllerServer$BlurClientRemote.execute(
> > BlurControllerServer.java:139)
> >
> > When do we get such an Exception? In-correct timeout settings or
> > shard-server restarts etc...
> >
> > Any help is much appreciated
> >
> > --
> > Ravi
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message