cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Ikeda <anthony.ikeda....@gmail.com>
Subject Re: Trying to find the problem with a broken pipe
Date Tue, 02 Aug 2011 18:13:16 GMT
The link (which I may be misreading) is
http://groups.google.com/group/hector-users/browse_thread/thread/8d7004b6f85a0f2e

It's only started happening today and happened on 2 occassions (8:43 and
10:21) performing the same function (querying a column family).

It seems to be trying to access a connection on one of the servers

The client accesses the first node:

2011-08-02 08:43:06,541 ERROR
[me.prettyprint.cassandra.connection.HThriftClient] - Could not flush
transport (to be expected if the pool is shutting down) in close for
client: CassandraClient<cassandradevrk1:9393-33>
org.apache.thrift.transport.TTransportException:
java.net.SocketException: Broken pipe

...
2011-08-02 08:43:06,544 WARN
[me.prettyprint.cassandra.connection.HConnectionManager] - Could not
fullfill request on this host CassandraClient<cassandradevrk1:9393-33>

...

2011-08-02 08:43:06,543 ERROR
[me.prettyprint.cassandra.connection.HConnectionManager] - MARK HOST
AS DOWN TRIGGERED for host cassandradevrk1(10.130.202.34):9393
2011-08-02 08:43:06,543 ERROR
[me.prettyprint.cassandra.connection.HConnectionManager] - Pool state
on shutdown: <ConcurrentCassandraClientPoolByHost>:{cassandradevrk1(10.130.202.34):9393};
IsActive?: true; Active: 1; Blocked: 0; Idle: 15; NumBeforeExhausted:
49
2011-08-02 08:43:06,543 ERROR
[me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown
triggered on <ConcurrentCassandraClientPoolByHost>:{cassandradevrk1(10.130.202.34):9393}
2011-08-02 08:43:06,544 ERROR
[me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown
complete on <ConcurrentCassandraClientPoolByHost>:{cassandradevrk1(10.130.202.34):9393}
2011-08-02 08:43:06,544 INFO
[me.prettyprint.cassandra.connection.CassandraHostRetryService] - Host
detected as down was added to retry queue:
cassandradevrk1(10.130.202.34):9393
2011-08-02 08:43:06,544 WARN
[me.prettyprint.cassandra.connection.HConnectionManager] - Could not
fullfill request on this host CassandraClient<cassandradevrk1:9393-33>
2011-08-02 08:43:06,544 WARN
[me.prettyprint.cassandra.connection.HConnectionManager] - Exception:
me.prettyprint.hector.api.exceptions.HectorTransportException:
org.apache.thrift.transport.TTransportException:
java.net.SocketException: Connection reset



Then it appears to try the second node and fails:

2011-08-02 08:43:06,556 INFO
[me.prettyprint.cassandra.connection.HConnectionManager] - Client
CassandraClient<cassandradevrk1:9393-33> released to inactive or dead
pool. Closing.
2011-08-02 08:43:06,557 ERROR
[me.prettyprint.cassandra.connection.HThriftClient] - Could not flush
transport (to be expected if the pool is shutting down) in close for
client: CassandraClient<cassandradevrk2:9393-49>

org.apache.thrift.transport.TTransportException:
java.net.SocketException: Broken pipe

2011-08-02 08:43:06,558 ERROR
[me.prettyprint.cassandra.connection.HConnectionManager] - MARK HOST
AS DOWN TRIGGERED for host cassandradevrk2(10.130.202.35):9393
2011-08-02 08:43:06,559 ERROR
[me.prettyprint.cassandra.connection.HConnectionManager] - Pool state
on shutdown: <ConcurrentCassandraClientPoolByHost>:{cassandradevrk2(10.130.202.35):9393};
IsActive?: true; Active: 1; Blocked: 0; Idle: 15; NumBeforeExhausted:
49
2011-08-02 08:43:06,559 ERROR
[me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown
triggered on <ConcurrentCassandraClientPoolByHost>:{cassandradevrk2(10.130.202.35):9393}
2011-08-02 08:43:06,559 ERROR
[me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown
complete on <ConcurrentCassandraClientPoolByHost>:{cassandradevrk2(10.130.202.35):9393}
2011-08-02 08:43:06,559 INFO
[me.prettyprint.cassandra.connection.CassandraHostRetryService] - Host
detected as down was added to retry queue:
cassandradevrk2(10.130.202.35):9393
2011-08-02 08:43:06,560 WARN
[me.prettyprint.cassandra.connection.HConnectionManager] - Could not
fullfill request on this host CassandraClient<cassandradevrk2:9393-49>
2011-08-02 08:43:06,560 WARN
[me.prettyprint.cassandra.connection.HConnectionManager] - Exception:
me.prettyprint.hector.api.exceptions.HectorTransportException:
org.apache.thrift.transport.TTransportException:
java.net.SocketException: Connection reset

The process is the same at 10:21.

*Are the exceptions related to any external events (e.g. node
restarts, network issues...)?*
Not that I'm aware, unless there are firewall timeouts between the
application and the node servers. Let me find out. The cassandra log files
have no errors reported.

*What versions of Hector and Cassandra are you running?*
Cassandra 0.8.1, Hector 0.8.0-1




On Tue, Aug 2, 2011 at 10:37 AM, Jim Ancona <jim@anconafamily.com> wrote:

> On Tue, Aug 2, 2011 at 4:36 PM, Anthony Ikeda
> <anthony.ikeda.dev@gmail.com> wrote:
> > I'm not sure if this is a problem with Hector or with Cassandra.
> > We seem to be seeing broken pipe issues with our connections on the
> client
> > side (Exception below). A bit of googling finds possibly a problem with
> the
> > amount of data we are trying to store, although I'm certain our datasets
> are
> > not all that large.
>
> I'm not sure what you're referring to here. Large requests could lead
> to timeouts, but that's not what you're seeing here. Could you link to
> the page you're referencing?
>
> > A nodetool ring command doesn't seem to present any downed nodes:
> > Address         DC          Rack        Status State   Load
>  Owns
> >    Token
> >
> >   153951716904446304929228999025275230571
> > 10.130.202.34   datacenter1 rack1       Up     Normal  470.74 KB
> > 79.19%  118538200848404459763384037192174096102
> > 10.130.202.35   datacenter1 rack1       Up     Normal  483.63 KB
> > 20.81%  153951716904446304929228999025275230571
> >
> > There are no errors in the cassandra server logs.
> >
> > Are there any particular timeouts on connections that we need to be aware
> > of? Or perhaps configure on the Cassandra nodes? Is this purely and issue
> > with the Hector API configuration?
>
> There is a server side timeout (rpc_timeout_in_ms in cassandra.yaml)
> and a Hector client-side timeout
> (CassandraHostConfigurator.cassandraThriftSocketTimeout). But again,
> the "Broken pipe" error is not a timeout, it indicates that something
> happened to the underlying network socket. For example you will see
> those when a server node is restarted.
>
> Some questions that might help troubleshoot this:
> How often are these occurring?
> Does this affect both nodes in the cluster or just one?
> Are the exceptions related to any external events (e.g. node restarts,
> network issues...)?
> What versions of Hector and Cassandra are you running?
>
> Keep in mind that failures like this will normally be retried by
> Hector, resulting in no loss of data. For that reason, I think that
> exception is logged as a warning in the newest Hector versions.
>
> We've seen something similar, but more catastrophic because it affects
> connectivity to the entire cluster, not just a single node. See this
> post for more details: http://goo.gl/hrgkw So far we haven't
> identified the cause.
>
> Jim
>
> > Anthony
> >
> > 2011-08-02 08:43:06,541 ERROR
> > [me.prettyprint.cassandra.connection.HThriftClient] - Could not flush
> > transport (to be expected if the pool is shutting down) in close for
> client:
> > CassandraClient<cassandradevrk1:9393-33>
> > org.apache.thrift.transport.TTransportException:
> java.net.SocketException:
> > Broken pipe
> >       at
> >
> org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147)
> >       at
> >
> org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:156)
> >       at
> >
> me.prettyprint.cassandra.connection.HThriftClient.close(HThriftClient.java:85)
> >       at
> >
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:232)
> >       at
> >
> me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
> >       at
> >
> me.prettyprint.cassandra.service.KeyspaceServiceImpl.getSlice(KeyspaceServiceImpl.java:289)
> >       at
> >
> me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:53)
> >       at
> >
> me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:49)
> >       at
> >
> me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
> >       at
> >
> me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
> >       at
> >
> me.prettyprint.cassandra.model.thrift.ThriftSliceQuery.execute(ThriftSliceQuery.java:48)
> >       at
> >
> com.wsgc.services.registry.persistenceservice.impl.cassandra.strategy.read.StandardFindRegistryPersistenceStrategy.findRegistryByProfileId(StandardFindRegistryPersistenceStrategy.java:237)
> >       at
> >
> com.wsgc.services.registry.persistenceservice.impl.cassandra.strategy.read.StandardFindRegistryPersistenceStrategy.execute(StandardFindRegistryPersistenceStrategy.java:277)
> >       at
> >
> com.wsgc.services.registry.registryservice.impl.service.StandardRegistryService.getRegistriesByProfileId(StandardRegistryService.java:327)
> >       at
> >
> com.wsgc.services.registry.webapp.impl.RegistryServicesController.getRegistriesByProfileId(RegistryServicesController.java:247)
> >       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >       at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >       at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >       at java.lang.reflect.Method.invoke(Method.java:597)
> >       at
> >
> org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:175)
> >       at
> >
> org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:421)
> >       at
> >
> org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:409)
> >       at
> >
> org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:774)
> >       at
> >
> org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:719)
> >       at
> >
> org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:644)
> >       at
> >
> org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:549)
> >       at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
> >       at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
> >       at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
> >       at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >       at
> >
> org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:77)
> >       at
> >
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76)
> >       at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> >       at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >       at
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> >       at
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> >       at
> >
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:563)
> >       at
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> >       at
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> >       at
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> >       at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> >       at
> org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
> >       at
> org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291)
> >       at
> org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:774)
> >       at
> >
> org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:703)
> >       at
> >
> org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:896)
> >       at
> >
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
> >       at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.net.SocketException: Broken pipe
> >       at java.net.SocketOutputStream.socketWrite0(Native Method)
> >       at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> >       at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> >       at
> >
> org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145)
> >       ... 47 more
> > 2011-08-02 08:43:06,543 ERROR
> > [me.prettyprint.cassandra.connection.HConnectionManager] - MARK HOST AS
> DOWN
> > TRIGGERED for host cassandradevrk1(10.130.202.34):9393
> > 2011-08-02 08:43:06,543 ERROR
> > [me.prettyprint.cassandra.connection.HConnectionManager] - Pool state on
> > shutdown:
> >
> <ConcurrentCassandraClientPoolByHost>:{cassandradevrk1(10.130.202.34):9393};
> > IsActive?: true; Active: 1; Blocked: 0; Idle: 15; NumBeforeExhausted: 49
> > 2011-08-02 08:43:06,543 ERROR
> > [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown
> > triggered on
> >
> <ConcurrentCassandraClientPoolByHost>:{cassandradevrk1(10.130.202.34):9393}
> > 2011-08-02 08:43:06,544 ERROR
> > [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown
> > complete on
> >
> <ConcurrentCassandraClientPoolByHost>:{cassandradevrk1(10.130.202.34):9393}
> > 2011-08-02 08:43:06,544 INFO
> > [me.prettyprint.cassandra.connection.CassandraHostRetryService] - Host
> > detected as down was added to retry queue:
> > cassandradevrk1(10.130.202.34):9393
> > 2011-08-02 08:43:06,544 WARN
> > [me.prettyprint.cassandra.connection.HConnectionManager] - Could not
> > fullfill request on this host CassandraClient<cassandradevrk1:9393-33>
> >
>

Mime
View raw message