cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
Date Wed, 13 Apr 2016 16:31:26 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239551#comment-15239551
] 

Paulo Motta commented on CASSANDRA-9935:
----------------------------------------

Hey [~ruoranwang], thanks for the report and helping troubleshoot the issue.

Do you have any update on this? While your patch might work, this might come at the expense
of performance, because the default {{getScanner}} implementation create an {{IScanner}} instance
for each sstable, while CASSANDRA-4142 improved this for LCS to have one scanner per level,
making iteration faster.

I think that what might be happening is some race condition, where an sstable is added or
removed from a level by a compaction during validation, but a {{LeveledScanner}} is created
assuming there are no overlaps within each level, so we get the {{received out of order AssertionError}}.

I created a [patch|https://github.com/apache/cassandra/commit/a8c573547677f97b875583b8992155e7333659c3]
that might solve this by verifying that the sstable level corresponds to the level in the
current manifest, so we can guarantee non-overlapness. Otherwise it means that sstable was
added or removed recently so we create an exclusive scanner for that sstable, so it will be
merged correctly during validation.

Are you able to create a custom jar with that patch and check if that solves the issue? I'm
attaching a .patch file to this ticket so you can apply in your custom branch.

> Repair fails with RuntimeException
> ----------------------------------
>
>                 Key: CASSANDRA-9935
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: C* 2.1.8, Debian Wheezy
>            Reporter: mlowicki
>            Assignee: Yuki Morishita
>             Fix For: 2.1.x
>
>         Attachments: 9935.patch, db1.sync.lati.osa.cassandra.log, db5.sync.lati.osa.cassandra.log,
system.log.10.210.3.117, system.log.10.210.3.221, system.log.10.210.3.230
>
>
> We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8
it started to work faster but now it fails with:
> {code}
> ...
> [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range
(-5474076923322749342,-5468600594078911162] finished
> [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range
(-8631877858109464676,-8624040066373718932] finished
> [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range
(-5372806541854279315,-5369354119480076785] finished
> [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range
(8166489034383821955,8168408930184216281] finished
> [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range
(6084602890817326921,6088328703025510057] finished
> [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range
(-781874602493000830,-781745173070807746] finished
> [2015-07-29 20:44:03,957] Repair command #4 finished
> error: nodetool failed, check server logs
> -- StackTrace --
> java.lang.RuntimeException: nodetool failed, check server logs
>         at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290)
>         at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)
> {code}
> After running:
> {code}
> nodetool repair --partitioner-range --parallel --in-local-dc sync
> {code}
> Last records in logs regarding repair are:
> {code}
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session
09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568]
finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session
17d8d860-3632-11e5-a93e-4963524a8bde for range (8063716953988492222,8065203836608925992] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session
23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162]
finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session
336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932]
finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session
4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785]
finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session
59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session
6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session
8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished
> {code}
> but a bit above I see (at least two times in attached log):
> {code}
> ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - Repair session
1b07ea50-3608-11e5-a93e-4963524a8bde for range (5765414319217852786,5781018794516851576] failed
with error org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde
on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException:
[repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]]
Validation failed in /10.195.15.162
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_80]
>         at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_80]
>         at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950)
~[apache-cassandra-2.1.8.jar:2.1.8]
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.1.8.jar:2.1.8]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
> Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException:
[repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]]
Validation failed in /10.195.15.162
>         at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na]
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) [apache-cassandra-2.1.8.jar:2.1.8]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
~[na:1.7.0_80]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
~[na:1.7.0_80]        ... 1 common frames omitted
> Caused by: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde
on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162
>         at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166)
~[apache-cassandra-2.1.8.jar:2.1.8]        at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:406)
~[apache-cassandra-2.1.8.jar:2.1.8]
>         at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134)
~[apache-cassandra-2.1.8.jar:2.1.8]        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
~[apache-cassandra-2.1.8.jar:2.1.8]
>         ... 3 common frames omittedINFO  [Thread-173887] 2015-07-29 20:44:03,854 StorageService.java:2952
- Repair session 846d9300-3608-11e5-a93e-4963524a8bde for range (-6705935
> 742755245856,-6704072966568763453] finished
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message