cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anup Shirolkar <anup.shirol...@instaclustr.com>
Subject Re: Cassandra 3.7 - Problem with Repairs - all nodes failing
Date Fri, 20 Apr 2018 03:34:18 GMT
Contd.

Upgrading from 3.7 to 3.11.1 will not involving any major changes.
It can be achieved without any downtime and it should not impact on
Cassandra clients.
You can test the upgrade on a test cluster to be sure if you are
considering to upgrade prod.

Thanks,
Anup

On 20 April 2018 at 13:28, Anup Shirolkar <anup.shirolkar@instaclustr.com>
wrote:

> Hi Leena,
>
> The repairs are most likely failing because of some bug in Cassandra 3.7.
> I don't have a JIRA reference handy but there are quite some issues in this
> version.
>
> Considering your scenario, it is highly recommended that you should
> upgrade to 3.11.1.
> Although, you have mentioned that upgrading is not an option, I would like
> to tell you that
>
> On 19 April 2018 at 23:19, Leena Ghatpande <lghatpande@hotmail.com> wrote:
>
>> we have 8 node prod cluster running on cassandra 3.7. Our 2 largest
>> tables have around 100M and 30M rows respectively while all others are
>> relatively smaller.
>>
>> we have been running repairs on alternate days on 2 of our keyspaces.
>> We run repair on each node in the cluster with the -pr option on every
>> table within each keyspace individually. Repairs are run sequentially on
>> each node
>> These were working fine, but with no change on the systems, they have
>> started failing since last month.
>>
>> The repairs have started failing for each table on every node with no
>> specific error.
>>
>> I have tried running scrub on every table and then running repair , but
>> still the repair fails for all tables.
>>
>> Our smallest table with only 100 rows also fails on repair.
>>
>> But if I run the repair with DC option (-dc localdatacenter) for local
>> datacenters, then the repairs are successfully. Is this indication that the
>> repairs are good?
>> we would still want the repairs to work on individually tables as
>> expected.
>>
>> Need help trying to get the repairs to work properly as we have a big
>> migration planned for june .
>>
>> Upgrading cassandra is not an option right now.
>>
>>
>> Here are some of the errors
>> INFO  [AntiEntropyStage:1] 2018-04-18 20:36:51,461 RepairSession.java:181
>> - [repair #223c73c2-4372-11e8-8749-89fc1dde5b7d] Received merkle tree
>> for clients from / IP
>> ERROR [ValidationExecutor:213] 2018-04-18 20:36:51,461 Validator.java:261
>> - Failed creating a merkle tree for [repair #223c73c2-4372-11e8-8749-89fc1dde5b7d
>> on secure/clients, [(1849652111528073119,1856811324137977760],
>> (3733211856223440695,3737790228588239952], (-2500456349659149537,-2498953852677197491],
>> (1735271399836012489,1735412813423041471], (1871725370007007817,1890457592856328448],
>> (4316163881057906640,4323247409810431754], (4286141602946572160,4308169130179803373],
>> (5189663040558066167,5193871822490506231], (7160723554094225326,7161133449395023060],
>> (-4363807597425543488,-4361416517953194804],
>> (7008956720664744733,7022523551326267501], (-5742986989228874052,-5734436401879059890],
>> (1828335330499002859,1849652111528073119], (7072368932695202361,7144087505892848370],
>> (-5791935107311742541,-5781988493712029404],
>> (7754917992280096132,7754953485457609099]]], /130.5.123.234 (see log for
>> details)
>> ERROR [ValidationExecutor:213] 2018-04-18 20:36:51,461
>> CassandraDaemon.java:217 - Exception in thread
>> Thread[ValidationExecutor:213,1,main]
>> java.lang.NullPointerException: null
>> INFO  [AntiEntropyStage:1] 2018-04-18 20:36:51,461 RepairSession.java:181
>> - [repair #223c73c2-4372-11e8-8749-89fc1dde5b7d] Received merkle tree
>> for clients from /IP
>> ERROR [Repair#113:12] 2018-04-18 20:36:51,461 CassandraDaemon.java:217 -
>> Exception in thread Thread[Repair#113:12,5,RMI Runtime]
>> com.google.common.util.concurrent.UncheckedExecutionException:
>> org.apache.cassandra.exceptions.RepairException: [repair
>> #223c73c2-4372-11e8-8749-89fc1dde5b7d on secure/clients,
>> [(1849652111528073119,1856811324137977760],
>> (3733211856223440695,3737790228588239952], (-2500456349659149537,-2498953852677197491],
>> (1735271399836012489,1735412813423041471], (1871725370007007817,1890457592856328448],
>> (4316163881057906640,4323247409810431754], (4286141602946572160,4308169130179803373],
>> (5189663040558066167,5193871822490506231], (7160723554094225326,7161133449395023060],
>> (-4363807597425543488,-4361416517953194804],
>> (7008956720664744733,7022523551326267501], (-5742986989228874052,-5734436401879059890],
>> (1828335330499002859,1849652111528073119], (7072368932695202361,7144087505892848370],
>> (-5791935107311742541,-5781988493712029404],
>> (7754917992280096132,7754953485457609099]]] Validation failed in /
>> 130.5.127.60
>>         at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1525)
>> ~[guava-18.0.jar:na]
>>         at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1511)
>> ~[guava-18.0.jar:na]
>>         at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160)
>> ~[apache-cassandra-3.7.jar:3.7]
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> ~[na:1.8.0_45]
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> ~[na:1.8.0_45]
>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_45]
>> Caused by: org.apache.cassandra.exceptions.RepairException: [repair
>> #223c73c2-4372-11e8-8749-89fc1dde5b7d on clients,
>> [(1849652111528073119,1856811324137977760],
>> (3733211856223440695,3737790228588239952], (-2500456349659149537,-2498953852677197491],
>> (1735271399836012489,1735412813423041471], (1871725370007007817,1890457592856328448],
>> (4316163881057906640,4323247409810431754], (4286141602946572160,4308169130179803373],
>> (5189663040558066167,5193871822490506231], (7160723554094225326,7161133449395023060],
>> (-4363807597425543488,-4361416517953194804],
>> (7008956720664744733,7022523551326267501], (-5742986989228874052,-5734436401879059890],
>> (1828335330499002859,1849652111528073119], (7072368932695202361,7144087505892848370],
>> (-5791935107311742541,-5781988493712029404],
>> (7754917992280096132,7754953485457609099]]] Validation failed in /
>> 130.5.127.60
>>         at org.apache.cassandra.repair.ValidationTask.treesReceived(ValidationTask.java:68)
>> ~[apache-cassandra-3.7.jar:3.7]
>>         at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183)
>> ~[apache-cassandra-3.7.jar:3.7]
>>         at org.apache.cassandra.service.ActiveRepairService.handleMessa
>> ge(ActiveRepairService.java:439) ~[apache-cassandra-3.7.jar:3.7]
>>         at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(
>> RepairMessageVerbHandler.java:169) ~[apache-cassandra-3.7.jar:3.7]
>>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
>> ~[apache-cassandra-3.7.jar:3.7]
>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> ~[na:1.8.0_45]
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> ~[na:1.8.0_45]
>>
>>
>>
>>
>>
>>
>

Mime
View raw message