cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Reddy <rahulreddy1...@gmail.com>
Subject Re: Rebooting one Cassandra node caused all the application nodes go down
Date Fri, 19 Jul 2019 18:43:52 GMT
Sorry no corruption errors.

Thanks Jeff,

Anything specific to look into if this happens again




On Fri, Jul 19, 2019, 2:40 PM Nitan Kainth <nitankainth@gmail.com> wrote:

> You no corruption error or you see corruption error?
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On Jul 19, 2019, at 1:52 PM, Rahul Reddy <rahulreddy1234@gmail.com> wrote:
>
> Schema matches and corruption errors in system.log
>
> On Fri, Jul 19, 2019, 1:33 PM Nitan Kainth <nitankainth@gmail.com> wrote:
>
>> Do you see schemat in sync? Nodetool describecluster.
>>
>> Check system log for any corruption.
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On Jul 19, 2019, at 12:32 PM, ZAIDI, ASAD A <az192g@att.com> wrote:
>>
>> “aws asked to set nvme_timeout to higher number in etc/grub.conf.”
>>
>>
>>
>> Did you ask AWS if setting higher value is real solution to bug - Is
>> there not any patch available to address the bug?   - just curios to know
>>
>>
>>
>> *From:* Rahul Reddy [mailto:rahulreddy1234@gmail.com
>> <rahulreddy1234@gmail.com>]
>> *Sent:* Friday, July 19, 2019 10:49 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Rebooting one Cassandra node caused all the application nodes
>> go down
>>
>>
>>
>> Here ,
>>
>>
>>
>> We have 6 nodes each in 2 data centers us-east-1 and us-west-2  . We have
>> RF 3 and  cl set to local quorum. And gossip snitch. All our instance are
>> c5.2xlarge and data files and comit logs are stored in gp2 ebs.  C5
>> instance type had a bug which aws asked to set nvme_timeout to higher
>> number in etc/grub.conf. after setting the parameter and did run nodetool
>> drain and reboot the node in east
>>
>>
>>
>> Instance cameup but Cassandra didn't come up normal had to start the
>> Cassandra. Cassandra cameup but it shows other instances down. Even though
>> didn't reboot the other node down same was observed in one other node. How
>> could that happen and don't any errors in system.log which is set to info.
>>
>> Without any intervention gossip settled in 10 mins entire cluster became
>> normal.
>>
>>
>>
>> Tried same thing West it happened again
>>
>>
>>
>>
>>
>>
>>
>> I'm concerned how to check what caused it and if a reboot happens again
>> how to avoid this.
>>
>>  If I just  STOP Cassandra instead of reboot I don't see this issue.
>>
>>
>>
>>

Mime
View raw message