incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: All Connections Are Bad...
Date Thu, 15 Dec 2016 12:03:10 GMT
Thanks Aaron for the clarification

On Sun, Dec 11, 2016 at 8:37 PM, Aaron McCurry <amccurry@gmail.com> wrote:

> I believe this timer does in fact test the pooled client connections.  I my
> experience the all connections bad exception usually occurs when a shard
> server is no responding in a timely manor.  It could be GCing or blocking
> from HDFS or some other unknown problem.
>
> Timer:
>
> https://github.com/apache/incubator-blur/blob/master/
> blur-thrift/src/main/java/org/apache/blur/thrift/ClientPool.java#L98
>
> Also there is a test method that will test connections before their use.
>
> https://github.com/apache/incubator-blur/blob/master/
> blur-thrift/src/main/java/org/apache/blur/thrift/ClientPool.java#L299
>
> Hope this helps.
>
> Aaron
>
>
>
> On Sat, Dec 10, 2016 at 5:56 AM, Ravikumar Govindarajan <
> ravikumar.govindarajan@gmail.com> wrote:
>
> > Just now tried to understand the logic...
> >
> > Whenever an IOException/TTransportException is thrown, we mark a
> > Connection
> > as bad. Slowly when all Connections are greeted by this, we get "All
> > Connections Bad..."
> >
> > Is it a good idea to write a reaper thread to proactively try & replenish
> > the bad Connection, instead of waiting for search to hit it at the wrong
> > moment?
> >
> > Also, I just found that "staleness" check is eagerly performed. It should
> > be possible to return a live connection & refresh stale ones in
> background?
> > [*ClientPool.getConnection(Connection conn)*]
> >
> > --
> > Ravi
> >
> >
> >
> > On Sat, Dec 10, 2016 at 3:44 PM, Ravikumar Govindarajan <
> > ravikumar.govindarajan@gmail.com> wrote:
> >
> > > Often, I find myself bang in the middle of a query, when
> > BlurClientManager
> > > comes up with this error. Happens both ways. When my app-server talks
> to
> > > controller-server as well as controller-server talks to shard-server.
> > This
> > > is affecting search experience quite a bit nowadays in production!!
> > >
> > > BlurException(message:Unknown error during remote call to node
> > > [AAA.BB.CCC.DD:40020], stackTraceStr:org.apache.blur.
> > thrift.BadConnectionException:
> > > Could not connect to controller/shard server. All connections are bad.
> at
> > > org.apache.blur.thrift.BlurClientManager.execute(
> > BlurClientManager.java:243)
> > > at org.apache.blur.thrift.BlurClientManager.execute(
> > BlurClientManager.java:314)
> > > at org.apache.blur.thrift.BlurControllerServer$
> BlurClientRemote$1.call(
> > BlurControllerServer.java:132)
> > > at org.apache.blur.thrift.BlurControllerServer$
> BlurClientRemote.execute(
> > > BlurControllerServer.java:139)
> > >
> > > When do we get such an Exception? In-correct timeout settings or
> > > shard-server restarts etc...
> > >
> > > Any help is much appreciated
> > >
> > > --
> > > Ravi
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message