Thanks Aaron! Here is the exception, is that the timeout between nodes? any parameter I can change to reduce timeout?

me.prettyprint.hector.api.exceptions.HectorTransportException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
        at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:33)
        at me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:130)
        at me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:100)
        at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
        at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:246)
        at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
        at me.prettyprint.cassandra.model.CqlQuery.execute(CqlQuery.java:99)
        at com.netseer.cassandra.cache.dao.CacheReader.getRows(CacheReader.java:267)
        at com.netseer.cassandra.cache.dao.CacheReader.getCache0(CacheReader.java:55)
        at com.netseer.cassandra.cache.dao.CacheDao.getCaches(CacheDao.java:85)
        at com.netseer.cassandra.cache.dao.CacheDao.getCache(CacheDao.java:71)
        at com.netseer.cassandra.cache.dao.CacheDao.getCache(CacheDao.java:149)
        at com.netseer.cassandra.cache.service.CacheServiceImpl.getCache(CacheServiceImpl.java:55)
        at com.netseer.cassandra.cache.service.CacheServiceImpl.getCache(CacheServiceImpl.java:28)
        at com.netseer.dsat.cache.CassandraDSATCacheImpl.get(CassandraDSATCacheImpl.java:62)
        at com.netseer.dsat.cache.CassandraDSATCacheImpl.getTimedValue(CassandraDSATCacheImpl.java:144)
        at com.netseer.dsat.serving.GenericCacheManager$4.call(GenericCacheManager.java:427)
        at com.netseer.dsat.serving.GenericCacheManager$4.call(GenericCacheManager.java:423)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
        at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
        at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql_query(Cassandra.java:1698)
        at org.apache.cassandra.thrift.Cassandra$Client.execute_cql_query(Cassandra.java:1682)
        at me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:106)
        ... 21 more
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
        ... 31 more


and there is the tpstats

[cassy@s2.dsat4 ~]$  ~/bin/nodetool -h localhost tpstats
Pool Name                    Active   Pending      Completed   Blocked  All time blocked
ReadStage                         3         3      414129625         0                 0
RequestResponseStage              0         0      300591600         0                 0
MutationStage                     0         0       96585276         0                 0
ReadRepairStage                   0         0       94185465         0                 0
ReplicateOnWriteStage             0         0              0         0                 0
GossipStage                       0         0        2684813         0                 0
AntiEntropyStage                  0         0           5436         0                 0
MigrationStage                    0         0             22         0                 0
MemtablePostFlusher               0         0           3553         0                 0
StreamStage                       0         0            167         0                 0
FlushWriter                       0         0           3582         0                23
MiscStage                         0         0           1163         0                 0
AntiEntropySessions               0         0            399         0                 0
InternalResponseStage             0         0              0         0                 0
HintedHandoff                     0         0           2746         0                 0

Message type           Dropped
RANGE_SLICE                  0
READ_REPAIR              17931
BINARY                       0
READ                   5185149
MUTATION                232317
REQUEST_RESPONSE          1317






On Sun, Apr 8, 2012 at 2:15 PM, aaron morton <aaron@thelastpickle.com> wrote:
You need to see if the timeout is from the client to the server, or between the server nodes.

If it's server side a TimedOutException will be thrown from thrift. Take a look at the nodetool tpstats on the servers, you will probably see lots of "Pending" tasks. Basically the cluster is overloaded. Consider:

* check the IO, CPU, GC state on the servers.
* ensuring the data and requests are evenly spread around the cluster.
* reducing the number of columns read in a select.

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/04/2012, at 5:30 AM, Daning Wang wrote:

> Hi all,
>
> We are using Hector and ofter we see lots of timeout exception in the log, I know that the hector can failover to other node, but I want to reduce the number of timeouts.
>
> any hector parameter I should change to reduce this error?
>
> also, on the server side, any kind of tunning need to do for the timeout?
>
>
> Thanks in advance.
>
>
> 12/04/04 15:13:20 ERROR com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms
> 12/04/04 15:13:25 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.28.78.123(10.28.78.123):9160
> 12/04/04 15:13:25 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> 12/04/04 15:13:44 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.240.113.171(10.240.113.171):9160
> 12/04/04 15:13:44 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.240.113.171(10.240.113.171):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.28.78.123(10.28.78.123):9160
> 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.123.83.114(10.123.83.114):9160
> 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.123.83.114(10.123.83.114):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.6.115.239(10.6.115.239):9160
> 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.6.115.239(10.6.115.239):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> 12/04/04 15:13:49 ERROR com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms
> 12/04/04 15:13:49 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.120.205.48(10.120.205.48):9160
> 12/04/04 15:13:49 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.120.205.48(10.120.205.48):9160}; IsActive?: true; Active: 3; Blocked: 0; Idle: 3; NumBeforeExhausted: 17
> 12/04/04 15:13:50 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.28.20.200(10.28.20.200):9160
> 12/04/04 15:13:50 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.28.20.200(10.28.20.200):9160}; IsActive?: true; Active: 2; Blocked: 0; Idle: 4; NumBeforeExhausted: 18
> 12/04/04 15:13:51 ERROR com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms