incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clint Kelly <clint.ke...@gmail.com>
Subject Re: Cassandra process exiting mysteriously
Date Wed, 06 Aug 2014 04:43:41 GMT
HI Kevin,

Thanks for your reply.  That is what I assumed, but some of the posts
I read on Stack Overflow (e.g., the one that I referenced in my mail)
suggested otherwise.  I was just curious if others had experienced OOM
problems that weren't logged or if there were other common culprits.

Best regards,
Clint



On Tue, Aug 5, 2014 at 9:29 PM, Kevin Burton <burton@spinn3r.com> wrote:
> If there is an oom it will be in the logs.
>
> On Aug 5, 2014 8:17 PM, "Clint Kelly" <clint.kelly@gmail.com> wrote:
>>
>> Hi everyone,
>>
>> For some integration tests, we start up a CassandraDaemon in a
>> separate process (using the Java 7 ProcessBuilder API).  All of my
>> integration tests run beautifully on my laptop, but one of them fails
>> on our Jenkins cluster.
>>
>> The failing integration test does around 10k writes to different rows
>> and then 10k reads.  After running some number of reads, the job dies
>> with this error:
>>
>> com.datastax.driver.core.exceptions.NoHostAvailableException: All
>> host(s) tried for query failed (tried: /127.0.0.10:58209
>> (com.datastax.driver.core.exceptions.DriverException: Timeout during
>> read))
>>
>> This error appears to have occurred because the Cassandra process has
>> stopped.  The logs for the Cassandra process show some warnings during
>> batch writes (the batches are too big), no activity for a few minutes
>> (I assume this is because all of the read operations were proceeding
>> smoothly), and then look like the following:
>>
>> INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,903
>> ThriftServer.java (line 141) Stop listening to thrift clients
>>  INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,920 Server.java
>> (line 182) Stop listening for CQL clients
>>  INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,930
>> Gossiper.java (line 1279) Announcing shutdown
>>  INFO [StorageServiceShutdownHook] 2014-08-05 19:14:53,930
>> MessagingService.java (line 683) Waiting for messaging service to
>> quiesce
>>  INFO [ACCEPT-/127.0.0.10] 2014-08-05 19:14:53,931
>> MessagingService.java (line 923) MessagingService has terminated the
>> accept() thread
>>
>> Does anyone have any ideas about how to debug this?  Looking around on
>> google I found some threads suggesting that this could occur from an
>> OOM error
>> (http://stackoverflow.com/questions/23755040/cassandra-exits-with-no-errors).
>> Wouldn't such an error be logged, however?
>>
>> The test that fails is a test of our MapReduce Hadoop InputFormat and
>> as such it does some pretty big queries across multiple rows (over a
>> range of partitioning key tokens).  The default fetch size I believe
>> is 5000 rows, and the values in the rows I am fetching are just simple
>> strings, so I would not think the amount of data in a single read
>> would be too big.
>>
>> FWIW I don't see any log messages about garbage collection for at
>> least 3min before the process shuts down (and no GC messages after the
>> test stops doing writes and starts doing reads).
>>
>> I'd greatly appreciate any help before my team kills me for breaking
>> our Jenkins build so consistently!  :)
>>
>> Best regards,
>> Clint

Mime
View raw message