cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clint Kelly <>
Subject Re: Cassandra process exiting mysteriously
Date Wed, 06 Aug 2014 04:43:41 GMT
HI Kevin,

Thanks for your reply.  That is what I assumed, but some of the posts
I read on Stack Overflow (e.g., the one that I referenced in my mail)
suggested otherwise.  I was just curious if others had experienced OOM
problems that weren't logged or if there were other common culprits.

Best regards,

On Tue, Aug 5, 2014 at 9:29 PM, Kevin Burton <> wrote:
> If there is an oom it will be in the logs.
> On Aug 5, 2014 8:17 PM, "Clint Kelly" <> wrote:
>> Hi everyone,
>> For some integration tests, we start up a CassandraDaemon in a
>> separate process (using the Java 7 ProcessBuilder API).  All of my
>> integration tests run beautifully on my laptop, but one of them fails
>> on our Jenkins cluster.
>> The failing integration test does around 10k writes to different rows
>> and then 10k reads.  After running some number of reads, the job dies
>> with this error:
>> com.datastax.driver.core.exceptions.NoHostAvailableException: All
>> host(s) tried for query failed (tried: /
>> (com.datastax.driver.core.exceptions.DriverException: Timeout during
>> read))
>> This error appears to have occurred because the Cassandra process has
>> stopped.  The logs for the Cassandra process show some warnings during
>> batch writes (the batches are too big), no activity for a few minutes
>> (I assume this is because all of the read operations were proceeding
>> smoothly), and then look like the following:
>> INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,903
>> (line 141) Stop listening to thrift clients
>>  INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,920
>> (line 182) Stop listening for CQL clients
>>  INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,930
>> (line 1279) Announcing shutdown
>>  INFO [StorageServiceShutdownHook] 2014-08-05 19:14:53,930
>> (line 683) Waiting for messaging service to
>> quiesce
>>  INFO [ACCEPT-/] 2014-08-05 19:14:53,931
>> (line 923) MessagingService has terminated the
>> accept() thread
>> Does anyone have any ideas about how to debug this?  Looking around on
>> google I found some threads suggesting that this could occur from an
>> OOM error
>> (
>> Wouldn't such an error be logged, however?
>> The test that fails is a test of our MapReduce Hadoop InputFormat and
>> as such it does some pretty big queries across multiple rows (over a
>> range of partitioning key tokens).  The default fetch size I believe
>> is 5000 rows, and the values in the rows I am fetching are just simple
>> strings, so I would not think the amount of data in a single read
>> would be too big.
>> FWIW I don't see any log messages about garbage collection for at
>> least 3min before the process shuts down (and no GC messages after the
>> test stops doing writes and starts doing reads).
>> I'd greatly appreciate any help before my team kills me for breaking
>> our Jenkins build so consistently!  :)
>> Best regards,
>> Clint

View raw message