cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Romain Hardouin <romainh...@yahoo.fr>
Subject Re: question of keyspace that just disappeared
Date Fri, 03 Mar 2017 12:56:43 GMT
I suspect a lack of 3.x reliability. Cassandra could had gave up with dropped messages but
not with a "drop keyspace". I mean I already saw some spark jobs with too much executors that
produce a high load average on a DC. I saw a C* node with a 1 min. load avg of 140 that can
still have a P99 read latency at 40ms. But I never saw a disappearing keyspace. There are
old tickets regarding C* 1.x but as far as I remember it was due to a create/drop/create keyspace.

    Le Vendredi 3 mars 2017 13h44, George Webster <webstergd@gmail.com> a écrit :
 

 Thank you for your reply and good to know about the debug statement. I haven't  
We never dropped or re-created the keyspace before. We haven't even performed writes to that
keyspace in months. I also checked the permissions of Apache, that user had read only access. 
Unfortunately, I reverted from a backend recently. I cannot say for sure anymore if I saw
something in system before the revert. 
Anyway, hopefully it was just a fluke. We have some crazy ML libraries running on it maybe
Cassandra just gave up? Ohh well, Cassandra is a a champ and we haven't really had issues
with it before. 
On Thu, Mar 2, 2017 at 6:51 PM, Romain Hardouin <romainh_ml@yahoo.fr> wrote:

Did you inspect system tables to see if there is some traces of your keyspace? Did you ever
drop and re-create this keyspace before that?
Lines in debug appear because fd interval is > 2 seconds (logs are in nanoseconds). You
can override intervals via -Dcassandra.fd_initial_value_ ms and -Dcassandra.fd_max_interval_ms
properties. Are you sure you didn't have these lines in debug logs before? I used to see them
a lot prior to increase intervals to 4 seconds. 
Best,
Romain

    Le Mardi 28 février 2017 18h25, George Webster <webstergd@gmail.com> a écrit :
 

 Hey Cassandra Users,
We recently encountered an issue with a keyspace just disappeared. I was curious if anyone
has had this occur before and can provide some insight. 
We are using cassandra 3.10. 2 DCs  3 nodes each. The data was still located in the storage
folder but is not located inside Cassandra
I searched the logs for any hints of error or commands being executed that could have caused
a loss of a keyspace. Unfortunately I found nothing. In the logs the only unusual issue i
saw was a series of read timeouts that occurred right around when the keyspace went away.
Since then I see numerous entries in debug log as the following:
DEBUG [GossipStage:1] 2017-02-28 18:14:12,580 FailureDetector.java:457 - Ignoring interval
time of 2155674599 for /x.x.x..12DEBUG [GossipStage:1] 2017-02-28 18:14:16,580 FailureDetector.java:457
- Ignoring interval time of 2945213745 for /x.x.x.81DEBUG [GossipStage:1] 2017-02-28 18:14:19,590
FailureDetector.java:457 - Ignoring interval time of 2006530862 for /x.x.x..69DEBUG [GossipStage:1]
2017-02-28 18:14:27,434 FailureDetector.java:457 - Ignoring interval time of 3441841231 for
/x.x.x.82DEBUG [GossipStage:1] 2017-02-28 18:14:29,588 FailureDetector.java:457 - Ignoring
interval time of 2153964846 for /x.x.x.82DEBUG [GossipStage:1] 2017-02-28 18:14:33,582 FailureDetector.java:457
- Ignoring interval time of 2588593281 for /x.x.x.82DEBUG [GossipStage:1] 2017-02-28 18:14:37,588
FailureDetector.java:457 - Ignoring interval time of 2005305693 for /x.x.x.69DEBUG [GossipStage:1]
2017-02-28 18:14:38,592 FailureDetector.java:457 - Ignoring interval time of 2009244850 for
/x.x.x.82DEBUG [GossipStage:1] 2017-02-28 18:14:43,584 FailureDetector.java:457 - Ignoring
interval time of 2149192677 for /x.x.x.69DEBUG [GossipStage:1] 2017-02-28 18:14:45,605 FailureDetector.java:457
- Ignoring interval time of 2021180918 for /x.x.x.85DEBUG [GossipStage:1] 2017-02-28 18:14:46,432
FailureDetector.java:457 - Ignoring interval time of 2436026101 for /x.x.x.81DEBUG [GossipStage:1]
2017-02-28 18:14:46,432 FailureDetector.java:457 - Ignoring interval time of 2436187894 for
/x.x.x.82
During the time of the disappearing keyspace we had two concurrent activities:1) Running a
Spark job (via HDP 2.5.3 in Yarn) that was performing a countbykey. It was using they Keyspace
that disappeared. The operation crashed.2) We created a new keyspace to test out scheme. Only
"fancy" thing in that keyspace are a few material view tables. Data was being loaded into
that keyspace during the crash. The load process was extracting information and then just
writing to Cassandra. 
Any ideas? Anyone seen this before?
Thanks,George

   



   
Mime
View raw message