From user-return-19393-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Tue Aug 2 18:13:46 2011 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4BB1C6DA4 for ; Tue, 2 Aug 2011 18:13:46 +0000 (UTC) Received: (qmail 75752 invoked by uid 500); 2 Aug 2011 18:13:44 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 75543 invoked by uid 500); 2 Aug 2011 18:13:43 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 75528 invoked by uid 99); 2 Aug 2011 18:13:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Aug 2011 18:13:43 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of anthony.ikeda.dev@gmail.com designates 209.85.215.44 as permitted sender) Received: from [209.85.215.44] (HELO mail-ew0-f44.google.com) (209.85.215.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Aug 2011 18:13:38 +0000 Received: by ewy19 with SMTP id 19so20729ewy.31 for ; Tue, 02 Aug 2011 11:13:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=ii5+7D0/oNnJkbfbPBfFu1PYD7H2HeghBwfXCwN+L3M=; b=Hw2SKxbOiWMd7zmiUHn0q9/GjkewsvGYZ0PJd4Fsw3hvmZ4oj8msNh+2GQxhWWXdun R7c8ihXPXldCXM/x2LmSgShEjR4HtVx2X7NtZQI41kDzWKsBwUIeXx/U40jLq5s5gMy2 LRjtBzxYlE6grDzAywEdFD94CelzUBamJYPV0= MIME-Version: 1.0 Received: by 10.213.10.2 with SMTP id n2mr275914ebn.45.1312308796710; Tue, 02 Aug 2011 11:13:16 -0700 (PDT) Received: by 10.213.16.142 with HTTP; Tue, 2 Aug 2011 11:13:16 -0700 (PDT) In-Reply-To: References: Date: Tue, 2 Aug 2011 11:13:16 -0700 Message-ID: Subject: Re: Trying to find the problem with a broken pipe From: Anthony Ikeda To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0015174bde6ea9e13404a989b3d8 --0015174bde6ea9e13404a989b3d8 Content-Type: text/plain; charset=ISO-8859-1 The link (which I may be misreading) is http://groups.google.com/group/hector-users/browse_thread/thread/8d7004b6f85a0f2e It's only started happening today and happened on 2 occassions (8:43 and 10:21) performing the same function (querying a column family). It seems to be trying to access a connection on one of the servers The client accesses the first node: 2011-08-02 08:43:06,541 ERROR [me.prettyprint.cassandra.connection.HThriftClient] - Could not flush transport (to be expected if the pool is shutting down) in close for client: CassandraClient org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe ... 2011-08-02 08:43:06,544 WARN [me.prettyprint.cassandra.connection.HConnectionManager] - Could not fullfill request on this host CassandraClient ... 2011-08-02 08:43:06,543 ERROR [me.prettyprint.cassandra.connection.HConnectionManager] - MARK HOST AS DOWN TRIGGERED for host cassandradevrk1(10.130.202.34):9393 2011-08-02 08:43:06,543 ERROR [me.prettyprint.cassandra.connection.HConnectionManager] - Pool state on shutdown: :{cassandradevrk1(10.130.202.34):9393}; IsActive?: true; Active: 1; Blocked: 0; Idle: 15; NumBeforeExhausted: 49 2011-08-02 08:43:06,543 ERROR [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown triggered on :{cassandradevrk1(10.130.202.34):9393} 2011-08-02 08:43:06,544 ERROR [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown complete on :{cassandradevrk1(10.130.202.34):9393} 2011-08-02 08:43:06,544 INFO [me.prettyprint.cassandra.connection.CassandraHostRetryService] - Host detected as down was added to retry queue: cassandradevrk1(10.130.202.34):9393 2011-08-02 08:43:06,544 WARN [me.prettyprint.cassandra.connection.HConnectionManager] - Could not fullfill request on this host CassandraClient 2011-08-02 08:43:06,544 WARN [me.prettyprint.cassandra.connection.HConnectionManager] - Exception: me.prettyprint.hector.api.exceptions.HectorTransportException: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset Then it appears to try the second node and fails: 2011-08-02 08:43:06,556 INFO [me.prettyprint.cassandra.connection.HConnectionManager] - Client CassandraClient released to inactive or dead pool. Closing. 2011-08-02 08:43:06,557 ERROR [me.prettyprint.cassandra.connection.HThriftClient] - Could not flush transport (to be expected if the pool is shutting down) in close for client: CassandraClient org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe 2011-08-02 08:43:06,558 ERROR [me.prettyprint.cassandra.connection.HConnectionManager] - MARK HOST AS DOWN TRIGGERED for host cassandradevrk2(10.130.202.35):9393 2011-08-02 08:43:06,559 ERROR [me.prettyprint.cassandra.connection.HConnectionManager] - Pool state on shutdown: :{cassandradevrk2(10.130.202.35):9393}; IsActive?: true; Active: 1; Blocked: 0; Idle: 15; NumBeforeExhausted: 49 2011-08-02 08:43:06,559 ERROR [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown triggered on :{cassandradevrk2(10.130.202.35):9393} 2011-08-02 08:43:06,559 ERROR [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown complete on :{cassandradevrk2(10.130.202.35):9393} 2011-08-02 08:43:06,559 INFO [me.prettyprint.cassandra.connection.CassandraHostRetryService] - Host detected as down was added to retry queue: cassandradevrk2(10.130.202.35):9393 2011-08-02 08:43:06,560 WARN [me.prettyprint.cassandra.connection.HConnectionManager] - Could not fullfill request on this host CassandraClient 2011-08-02 08:43:06,560 WARN [me.prettyprint.cassandra.connection.HConnectionManager] - Exception: me.prettyprint.hector.api.exceptions.HectorTransportException: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset The process is the same at 10:21. *Are the exceptions related to any external events (e.g. node restarts, network issues...)?* Not that I'm aware, unless there are firewall timeouts between the application and the node servers. Let me find out. The cassandra log files have no errors reported. *What versions of Hector and Cassandra are you running?* Cassandra 0.8.1, Hector 0.8.0-1 On Tue, Aug 2, 2011 at 10:37 AM, Jim Ancona wrote: > On Tue, Aug 2, 2011 at 4:36 PM, Anthony Ikeda > wrote: > > I'm not sure if this is a problem with Hector or with Cassandra. > > We seem to be seeing broken pipe issues with our connections on the > client > > side (Exception below). A bit of googling finds possibly a problem with > the > > amount of data we are trying to store, although I'm certain our datasets > are > > not all that large. > > I'm not sure what you're referring to here. Large requests could lead > to timeouts, but that's not what you're seeing here. Could you link to > the page you're referencing? > > > A nodetool ring command doesn't seem to present any downed nodes: > > Address DC Rack Status State Load > Owns > > Token > > > > 153951716904446304929228999025275230571 > > 10.130.202.34 datacenter1 rack1 Up Normal 470.74 KB > > 79.19% 118538200848404459763384037192174096102 > > 10.130.202.35 datacenter1 rack1 Up Normal 483.63 KB > > 20.81% 153951716904446304929228999025275230571 > > > > There are no errors in the cassandra server logs. > > > > Are there any particular timeouts on connections that we need to be aware > > of? Or perhaps configure on the Cassandra nodes? Is this purely and issue > > with the Hector API configuration? > > There is a server side timeout (rpc_timeout_in_ms in cassandra.yaml) > and a Hector client-side timeout > (CassandraHostConfigurator.cassandraThriftSocketTimeout). But again, > the "Broken pipe" error is not a timeout, it indicates that something > happened to the underlying network socket. For example you will see > those when a server node is restarted. > > Some questions that might help troubleshoot this: > How often are these occurring? > Does this affect both nodes in the cluster or just one? > Are the exceptions related to any external events (e.g. node restarts, > network issues...)? > What versions of Hector and Cassandra are you running? > > Keep in mind that failures like this will normally be retried by > Hector, resulting in no loss of data. For that reason, I think that > exception is logged as a warning in the newest Hector versions. > > We've seen something similar, but more catastrophic because it affects > connectivity to the entire cluster, not just a single node. See this > post for more details: http://goo.gl/hrgkw So far we haven't > identified the cause. > > Jim > > > Anthony > > > > 2011-08-02 08:43:06,541 ERROR > > [me.prettyprint.cassandra.connection.HThriftClient] - Could not flush > > transport (to be expected if the pool is shutting down) in close for > client: > > CassandraClient > > org.apache.thrift.transport.TTransportException: > java.net.SocketException: > > Broken pipe > > at > > > org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147) > > at > > > org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:156) > > at > > > me.prettyprint.cassandra.connection.HThriftClient.close(HThriftClient.java:85) > > at > > > me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:232) > > at > > > me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131) > > at > > > me.prettyprint.cassandra.service.KeyspaceServiceImpl.getSlice(KeyspaceServiceImpl.java:289) > > at > > > me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:53) > > at > > > me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:49) > > at > > > me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20) > > at > > > me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85) > > at > > > me.prettyprint.cassandra.model.thrift.ThriftSliceQuery.execute(ThriftSliceQuery.java:48) > > at > > > com.wsgc.services.registry.persistenceservice.impl.cassandra.strategy.read.StandardFindRegistryPersistenceStrategy.findRegistryByProfileId(StandardFindRegistryPersistenceStrategy.java:237) > > at > > > com.wsgc.services.registry.persistenceservice.impl.cassandra.strategy.read.StandardFindRegistryPersistenceStrategy.execute(StandardFindRegistryPersistenceStrategy.java:277) > > at > > > com.wsgc.services.registry.registryservice.impl.service.StandardRegistryService.getRegistriesByProfileId(StandardRegistryService.java:327) > > at > > > com.wsgc.services.registry.webapp.impl.RegistryServicesController.getRegistriesByProfileId(RegistryServicesController.java:247) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > > org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:175) > > at > > > org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:421) > > at > > > org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:409) > > at > > > org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:774) > > at > > > org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:719) > > at > > > org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:644) > > at > > > org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:549) > > at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) > > at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) > > at > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) > > at > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > > at > > > org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:77) > > at > > > org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76) > > at > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > > at > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > > at > > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > > at > > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > > at > > > org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:563) > > at > > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) > > at > > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > > at > > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > > at > > > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) > > at > org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190) > > at > org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291) > > at > org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:774) > > at > > > org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:703) > > at > > > org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:896) > > at > > > org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690) > > at java.lang.Thread.run(Thread.java:662) > > Caused by: java.net.SocketException: Broken pipe > > at java.net.SocketOutputStream.socketWrite0(Native Method) > > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > > at java.net.SocketOutputStream.write(SocketOutputStream.java:136) > > at > > > org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145) > > ... 47 more > > 2011-08-02 08:43:06,543 ERROR > > [me.prettyprint.cassandra.connection.HConnectionManager] - MARK HOST AS > DOWN > > TRIGGERED for host cassandradevrk1(10.130.202.34):9393 > > 2011-08-02 08:43:06,543 ERROR > > [me.prettyprint.cassandra.connection.HConnectionManager] - Pool state on > > shutdown: > > > :{cassandradevrk1(10.130.202.34):9393}; > > IsActive?: true; Active: 1; Blocked: 0; Idle: 15; NumBeforeExhausted: 49 > > 2011-08-02 08:43:06,543 ERROR > > [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown > > triggered on > > > :{cassandradevrk1(10.130.202.34):9393} > > 2011-08-02 08:43:06,544 ERROR > > [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown > > complete on > > > :{cassandradevrk1(10.130.202.34):9393} > > 2011-08-02 08:43:06,544 INFO > > [me.prettyprint.cassandra.connection.CassandraHostRetryService] - Host > > detected as down was added to retry queue: > > cassandradevrk1(10.130.202.34):9393 > > 2011-08-02 08:43:06,544 WARN > > [me.prettyprint.cassandra.connection.HConnectionManager] - Could not > > fullfill request on this host CassandraClient > > > --0015174bde6ea9e13404a989b3d8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The link (which I may be misreading) is=A0http://groups.= google.com/group/hector-users/browse_thread/thread/8d7004b6f85a0f2e
It's only started happening today and happened on 2 occa= ssions (8:43 and 10:21) performing the same function (querying a column fam= ily).

It seems to be trying to access a connection= on one of the servers=A0

The client accesses the first node:
2011-08-02 08:43:06=
,541 ERROR [me.prettyprint.cassandra.connection.HThriftClient] - Could not =
flush transport (to be expected if the pool is shutting down) in close for =
client: CassandraClient<cassandradevrk1:9393-33>
org.apache.thrift.transport.TTransportException: java.net.SocketException: =
Broken pipe
..= .
2011-08-02 08:43:06,544 WARN [me.prettyprint.cassandra.co= nnection.HConnectionManager] - Could not fullfill request on this host Cass= andraClient<cassandradevrk1:9393-33>
...
2011-08-02 08:43:06,543 ERROR [me.prettyprint.cas=
sandra.connection.HConnectionManager] - MARK HOST AS DOWN TRIGGERED for hos=
t cassandradevrk1(10.130.202.34):9393
2011-08-02 08:43:06,543 ERROR [me.prettyprint.cassandra.connection.HConnect=
ionManager] - Pool state on shutdown: <ConcurrentCassandraClientPoolByHo=
st>:{cassandradevrk1(10.130.202.34):9393}; IsActive?: true; Active: 1; B=
locked: 0; Idle: 15; NumBeforeExhausted: 49
2011-08-02 08:43:06,543 ERROR [me.prettyprint.cassandra.connection.Concurre=
ntHClientPool] - Shutdown triggered on <ConcurrentCassandraClientPoolByH=
ost>:{cassandradevrk1(10.130.202.34):9393}
2011-08-02 08:43:06,544 ERROR [me.prettyprint.cassandra.connection.Concurre=
ntHClientPool] - Shutdown complete on <ConcurrentCassandraClientPoolByHo=
st>:{cassandradevrk1(10.130.202.34):9393}
2011-08-02 08:43:06,544 INFO [me.prettyprint.cassandra.connection.Cassandra=
HostRetryService] - Host detected as down was added to retry queue: cassand=
radevrk1(10.130.202.34):9393
2011-08-02 08:43:06,544 WARN [me.prettyprint.cassandra.connection.HConnecti=
onManager] - Could not fullfill request on this host CassandraClient<cas=
sandradevrk1:9393-33>
2011-08-02 08:43:06,544 WARN [me.prettyprint.cassandra.connection.HConnecti=
onManager] - Exception:=20
me.prettyprint.hector.api.exceptions.HectorTransportException: org.apache.t=
hrift.transport.TTransportException: java.net.SocketException: Connection r=
eset


Then it appears to try the= second node and fails:
2011-08-02 08:43:06,556 INFO [me.prettyprint.cassandra.connection.HConnecti=
onManager] - Client CassandraClient<cassandradevrk1:9393-33> released=
 to inactive or dead pool. Closing.
2011-08-02 08:43:06,557 ERROR [me.prettyprint.cassandra.connection.HThriftC=
lient] - Could not flush transport (to be expected if the pool is shutting =
down) in close for client: CassandraClient<cassandradevrk2:9393-49>
or=
g.apache.thrift.transport.TTransportException: java.net.SocketException: Br=
oken pipe
2011-08-02 08:43:06,558 ERROR [me.prettyprint.cassandra.connection.HCon=
nectionManager] - MARK HOST AS DOWN TRIGGERED for host cassandradevrk2(10.1=
30.202.35):9393
2011-08-02 08:43:06,559 ERROR [me.prettyprint.cassandra.connection.HConnect=
ionManager] - Pool state on shutdown: <ConcurrentCassandraClientPoolByHo=
st>:{cassandradevrk2(10.130.202.35):9393}; IsActive?: true; Active: 1; B=
locked: 0; Idle: 15; NumBeforeExhausted: 49
2011-08-02 08:43:06,559 ERROR [me.prettyprint.cassandra.connection.Concurre=
ntHClientPool] - Shutdown triggered on <ConcurrentCassandraClientPoolByH=
ost>:{cassandradevrk2(10.130.202.35):9393}
2011-08-02 08:43:06,559 ERROR [me.prettyprint.cassandra.connection.Concurre=
ntHClientPool] - Shutdown complete on <ConcurrentCassandraClientPoolByHo=
st>:{cassandradevrk2(10.130.202.35):9393}
2011-08-02 08:43:06,559 INFO [me.prettyprint.cassandra.connection.Cassandra=
HostRetryService] - Host detected as down was added to retry queue: cassand=
radevrk2(10.130.202.35):9393
2011-08-02 08:43:06,560 WARN [me.prettyprint.cassandra.connection.HConnecti=
onManager] - Could not fullfill request on this host CassandraClient<cas=
sandradevrk2:9393-49>
2011-08-02 08:43:06,560 WARN [me.prettyprint.cassandra.connection.HConnecti=
onManager] - Exception:=20
me.prettyprint.hector.api.exceptions.HectorTransportException: org.apache.t=
hrift.transport.TTransportException: java.net.SocketException: Connection r=
eset
The process is the same at 10:21.
=
Ar= e the exceptions related to any external events (e.g. node restarts,=A0netw= ork issues...)?
Not that I'm aware, u= nless there are firewall timeouts between the application and the node serv= ers. Let me find out. The cassandra log files have no errors reported.

=
What versions o= f Hector and Cassandra are you running?
Cassandra 0.8.1, H= ector 0.8.0-1

=

=


On Tue, Aug 2, 2011 at 10:37 AM, Jim Ancona <jim@anconafamily.com> wrote= :
On Tue, Aug 2, 2011 at 4:= 36 PM, Anthony Ikeda
<anthony.ikeda.dev@gmail.= com> wrote:
> I'm not sure if this is a problem with Hector or with Cassandra. > We seem to be seeing broken pipe issues with our connections on the cl= ient
> side (Exception below). A bit of googling finds possibly a problem wit= h the
> amount of data we are trying to store, although I'm certain our da= tasets are
> not all that large.

I'm not sure what you're referring to here. Large requests co= uld lead
to timeouts, but that's not what you're seeing here. Could you link= to
the page you're referencing?

> A nodetool ring command doesn't seem to present any downed nodes:<= br> > Address =A0 =A0 =A0 =A0 DC =A0 =A0 =A0 =A0 =A0Rack =A0 =A0 =A0 =A0Stat= us State =A0 Load =A0 =A0 =A0 =A0 =A0 =A0Owns
> =A0 =A0Token
>
> =A0 153951716904446304929228999025275230571
> 10.130.202.34 =A0 datacenter1 rack1 =A0 =A0 =A0 Up =A0 =A0 Normal =A04= 70.74 KB
> 79.19% =A0118538200848404459763384037192174096102
> 10.130.202.35 =A0 datacenter1 rack1 =A0 =A0 =A0 Up =A0 =A0 Normal =A04= 83.63 KB
> 20.81% =A0153951716904446304929228999025275230571
>
> There are no errors in the cassandra server logs.
>
> Are there any particular timeouts on connections that we need to be aw= are
> of? Or perhaps configure on the Cassandra nodes? Is this purely and is= sue
> with the Hector API configuration?

There is a server side timeout (rpc_timeout_in_ms in cassandra.yaml)<= br> and a Hector client-side timeout
(CassandraHostConfigurator.cassandraThriftSocketTimeout). But again,
the "Broken pipe" error is not a timeout, it indicates that somet= hing
happened to the underlying network socket. For example you will see
those when a server node is restarted.

Some questions that might help troubleshoot this:
How often are these occurring?
Does this affect both nodes in the cluster or just one?
Are the exceptions related to any external events (e.g. node restarts,
network issues...)?
What versions of Hector and Cassandra are you running?

Keep in mind that failures like this will normally be retried by
Hector, resulting in no loss of data. For that reason, I think that
exception is logged as a warning in the newest Hector versions.

We've seen something similar, but more catastrophic because it affects<= br> connectivity to the entire cluster, not just a single node. See this
post for more details: ht= tp://goo.gl/hrgkw So far we haven't
identified the cause.

Jim

> Anthony
>
> 2011-08-02 08:43:06,541 ERROR
> [me.prettyprint.cassandra.connection.HThriftClient] - Could not flush<= br> > transport (to be expected if the pool is shutting down) in close for c= lient:
> CassandraClient<cassandradevrk1:9393-33>
> org.apache.thrift.transport.TTransportException: java.net.SocketExcept= ion:
> Broken pipe
> =A0 =A0 =A0 at
> org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTranspor= t.java:147)
> =A0 =A0 =A0 at
> org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.ja= va:156)
> =A0 =A0 =A0 at
> me.prettyprint.cassandra.connection.HThriftClient.close(HThriftClient.= java:85)
> =A0 =A0 =A0 at
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFail= over(HConnectionManager.java:232)
> =A0 =A0 =A0 at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailov= er(KeyspaceServiceImpl.java:131)
> =A0 =A0 =A0 at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl.getSlice(Keyspace= ServiceImpl.java:289)
> =A0 =A0 =A0 at
> me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(= ThriftSliceQuery.java:53)
> =A0 =A0 =A0 at
> me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(= ThriftSliceQuery.java:49)
> =A0 =A0 =A0 at
> me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceA= ndMeasure(KeyspaceOperationCallback.java:20)
> =A0 =A0 =A0 at
> me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKe= yspace.java:85)
> =A0 =A0 =A0 at
> me.prettyprint.cassandra.model.thrift.ThriftSliceQuery.execute(ThriftS= liceQuery.java:48)
> =A0 =A0 =A0 at
> com.wsgc.services.registry.persistenceservice.impl.cassandra.strategy.= read.StandardFindRegistryPersistenceStrategy.findRegistryByProfileId(Standa= rdFindRegistryPersistenceStrategy.java:237)
> =A0 =A0 =A0 at
> com.wsgc.services.registry.persistenceservice.impl.cassandra.strategy.= read.StandardFindRegistryPersistenceStrategy.execute(StandardFindRegistryPe= rsistenceStrategy.java:277)
> =A0 =A0 =A0 at
> com.wsgc.services.registry.registryservice.impl.service.StandardRegist= ryService.getRegistriesByProfileId(StandardRegistryService.java:327)
> =A0 =A0 =A0 at
> com.wsgc.services.registry.webapp.impl.RegistryServicesController.getR= egistriesByProfileId(RegistryServicesController.java:247)
> =A0 =A0 =A0 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Met= hod)
> =A0 =A0 =A0 at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j= ava:39)
> =A0 =A0 =A0 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess= orImpl.java:25)
> =A0 =A0 =A0 at java.lang.reflect.Method.invoke(Method.java:597)
> =A0 =A0 =A0 at
> org.springframework.web.bind.annotation.support.HandlerMethodInvoker.i= nvokeHandlerMethod(HandlerMethodInvoker.java:175)
> =A0 =A0 =A0 at
> org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandler= Adapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:421)
> =A0 =A0 =A0 at
> org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandler= Adapter.handle(AnnotationMethodHandlerAdapter.java:409)
> =A0 =A0 =A0 at
> org.springframework.web.servlet.DispatcherServlet.doDispatch(Dispatche= rServlet.java:774)
> =A0 =A0 =A0 at
> org.springframework.web.servlet.DispatcherServlet.doService(Dispatcher= Servlet.java:719)
> =A0 =A0 =A0 at
> org.springframework.web.servlet.FrameworkServlet.processRequest(Framew= orkServlet.java:644)
> =A0 =A0 =A0 at
> org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServle= t.java:549)
> =A0 =A0 =A0 at javax.servlet.http.HttpServlet.service(HttpServlet.java= :617)
> =A0 =A0 =A0 at javax.servlet.http.HttpServlet.service(HttpServlet.java= :717)
> =A0 =A0 =A0 at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli= cationFilterChain.java:290)
> =A0 =A0 =A0 at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi= lterChain.java:206)
> =A0 =A0 =A0 at
> org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal= (HiddenHttpMethodFilter.java:77)
> =A0 =A0 =A0 at
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRe= questFilter.java:76)
> =A0 =A0 =A0 at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli= cationFilterChain.java:235)
> =A0 =A0 =A0 at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi= lterChain.java:206)
> =A0 =A0 =A0 at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa= lve.java:233)
> =A0 =A0 =A0 at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa= lve.java:191)
> =A0 =A0 =A0 at
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(Authenticat= orBase.java:563)
> =A0 =A0 =A0 at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja= va:127)
> =A0 =A0 =A0 at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja= va:102)
> =A0 =A0 =A0 at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv= e.java:109)
> =A0 =A0 =A0 at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java= :298)
> =A0 =A0 =A0 at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHan= dler.java:190)
> =A0 =A0 =A0 at org.apache.jk.common.HandlerRequest.invoke(HandlerReque= st.java:291)
> =A0 =A0 =A0 at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket= .java:774)
> =A0 =A0 =A0 at
> org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.jav= a:703)
> =A0 =A0 =A0 at
> org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocke= t.java:896)
> =A0 =A0 =A0 at
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPo= ol.java:690)
> =A0 =A0 =A0 at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.SocketException: Broken pipe
> =A0 =A0 =A0 at java.net.SocketOutputStream.socketWrite0(Native Method)=
> =A0 =A0 =A0 at java.net.SocketOutputStream.socketWrite(SocketOutputStr= eam.java:92)
> =A0 =A0 =A0 at java.net.SocketOutputStream.write(SocketOutputStream.ja= va:136)
> =A0 =A0 =A0 at
> org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTranspor= t.java:145)
> =A0 =A0 =A0 ... 47 more
> 2011-08-02 08:43:06,543 ERROR
> [me.prettyprint.cassandra.connection.HConnectionManager] - MARK HOST A= S DOWN
> TRIGGERED for host cassandradevrk1(10.130.202.34):9393
> 2011-08-02 08:43:06,543 ERROR
> [me.prettyprint.cassandra.connection.HConnectionManager] - Pool state = on
> shutdown:
> <ConcurrentCassandraClientPoolByHost>:{cassandradevrk1(10.130.20= 2.34):9393};
> IsActive?: true; Active: 1; Blocked: 0; Idle: 15; NumBeforeExhausted: = 49
> 2011-08-02 08:43:06,543 ERROR
> [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown=
> triggered on
> <ConcurrentCassandraClientPoolByHost>:{cassandradevrk1(10.130.20= 2.34):9393}
> 2011-08-02 08:43:06,544 ERROR
> [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown=
> complete on
> <ConcurrentCassandraClientPoolByHost>:{cassandradevrk1(10.130.20= 2.34):9393}
> 2011-08-02 08:43:06,544 INFO
> [me.prettyprint.cassandra.connection.CassandraHostRetryService] - Host=
> detected as down was added to retry queue:
> cassandradevrk1(10.130.202.34):9393
> 2011-08-02 08:43:06,544 WARN
> [me.prettyprint.cassandra.connection.HConnectionManager] - Could not > fullfill request on this host CassandraClient<cassandradevrk1:9393-= 33>
>

--0015174bde6ea9e13404a989b3d8--