cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anubhav Kale <Anubhav.K...@microsoft.com.INVALID>
Subject RE: nodetool repair failure
Date Fri, 30 Jun 2017 17:26:05 GMT
If possible, simply read the table under question with consistency=ALL. This will trigger a
repair and is far more reliable than the nodetool command.

From: Balaji Venkatesan [mailto:venkatesan.balaji@gmail.com]
Sent: Thursday, June 29, 2017 7:26 PM
To: user@cassandra.apache.org
Subject: Re: nodetool repair failure

It did not help much. But other issue or error I saw when I repair the keyspace was it says

"Sync failed between /xx.xx.xx.93 and /xx.xx.xx.94" this was run from .91 node.



On Thu, Jun 29, 2017 at 4:44 PM, Akhil Mehra <akhilmehra@gmail.com<mailto:akhilmehra@gmail.com>>
wrote:
Run the following query and see if it gives you more information:

select * from system_distributed.repair_history;

Also is there any additional logging on the nodes where the error is coming from. Seems to
be xx.xx.xx.94 for your last run.


On 30/06/2017, at 9:43 AM, Balaji Venkatesan <venkatesan.balaji@gmail.com<mailto:venkatesan.balaji@gmail.com>>
wrote:

The verify and scrub went without any error on the keyspace. I ran it again with trace mode
and still the same issue


[2017-06-29 21:37:45,578] Parsing UPDATE system_distributed.parent_repair_history SET finished_at
= toTimestamp(now()), successful_ranges = {'....} WHERE parent_id=f1f10af0-5d12-11e7-8df9-59d19ef3dd23
[2017-06-29 21:37:45,580] Preparing statement
[2017-06-29 21:37:45,580] Determining replicas for mutation
[2017-06-29 21:37:45,580] Sending MUTATION message to /xx.xx.xx.95
[2017-06-29 21:37:45,580] Sending MUTATION message to /xx.xx.xx.94
[2017-06-29 21:37:45,580] Sending MUTATION message to /xx.xx.xx.93
[2017-06-29 21:37:45,581] REQUEST_RESPONSE message received from /xx.xx.xx.93
[2017-06-29 21:37:45,581] REQUEST_RESPONSE message received from /xx.xx.xx.94
[2017-06-29 21:37:45,581] Processing response from /xx.xx.xx.93
[2017-06-29 21:37:45,581] /xx.xx.xx.94: MUTATION message received from /xx.xx.xx.91
[2017-06-29 21:37:45,582] Processing response from /xx.xx.xx.94
[2017-06-29 21:37:45,582] /xx.xx.xx.93: MUTATION message received from /xx.xx.xx.91
[2017-06-29 21:37:45,582] /xx.xx.xx.95: MUTATION message received from /xx.xx.xx.91
[2017-06-29 21:37:45,582] /xx.xx.xx.94: Appending to commitlog
[2017-06-29 21:37:45,582] /xx.xx.xx.94: Adding to parent_repair_history memtable
[2017-06-29 21:37:45,582] Some repair failed
[2017-06-29 21:37:45,582] Repair command #3 finished in 1 minute 44 seconds
error: Repair job has failed with the error message: [2017-06-29 21:37:45,582] Some repair
failed
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error message: [2017-06-29 21:37:45,582]
Some repair failed
at org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:116)
at org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)
at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)
at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)
at com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108)



On Thu, Jun 29, 2017 at 1:36 PM, Subroto Barua <sbarua116@yahoo.com.invalid<mailto:sbarua116@yahoo.com.invalid>>
wrote:
Balaji,

Are you repairing a specific keyspace/table? if the failure is tied to a table, try 'verify'
and 'scrub' options on .91...see if you get any errors.




On Thursday, June 29, 2017, 12:12:14 PM PDT, Balaji Venkatesan <venkatesan.balaji@gmail.com<mailto:venkatesan.balaji@gmail.com>>
wrote:


Thanks. I tried with trace option and there is not much info. Here are the few log lines just
before it failed.


[2017-06-29 19:01:54,969] /xx.xx.xx.93: Sending REPAIR_MESSAGE message to /xx.xx.xx.91
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Appending to commitlog
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Adding to repair_history memtable
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Enqueuing response to /xx.xx.xx.91
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Appending to commitlog
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Adding to repair_history memtable
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Enqueuing response to /xx.xx.xx.91
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Appending to commitlog
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Adding to repair_history memtable
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Enqueuing response to /xx.xx.xx.91
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Appending to commitlog
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Adding to repair_history memtable
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Enqueuing response to /xx.xx.xx.91
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Appending to commitlog
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Adding to repair_history memtable
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Enqueuing response to /xx.xx.xx.91
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Appending to commitlog
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Adding to repair_history memtable
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Enqueuing response to /xx.xx.xx.91
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Sending REQUEST_RESPONSE message to /xx.xx.xx.91
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Sending REQUEST_RESPONSE message to /xx.xx.xx.91
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Sending REQUEST_RESPONSE message to /xx.xx.xx.91
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Sending REQUEST_RESPONSE message to /xx.xx.xx.91
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Sending REQUEST_RESPONSE message to /xx.xx.xx.91
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Sending REQUEST_RESPONSE message to /xx.xx.xx.91
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Sending REQUEST_RESPONSE message to /xx.xx.xx.91
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Sending REQUEST_RESPONSE message to /xx.xx.xx.91
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Sending REQUEST_RESPONSE message to /xx.xx.xx.91
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Sending REQUEST_RESPONSE message to /xx.xx.xx.91
[2017-06-29 19:01:54,969] /xx.xx.xx.92: Sending REQUEST_RESPONSE message to /xx.xx.xx.91
[2017-06-29 19:02:04,842] Some repair failed
[2017-06-29 19:02:04,848] Repair command #1 finished in 1 minute 2 seconds
error: Repair job has failed with the error message: [2017-06-29 19:02:04,842] Some repair
failed
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error message: [2017-06-29 19:02:04,842]
Some repair failed
at org.apache.cassandra.tools. RepairRunner.progress( RepairRunner.java:116)
at org.apache.cassandra.utils. progress.jmx. JMXNotificationProgressListene r.handleNotification(
JMXNotificationProgressListene r.java:77)
at com.sun.jmx.remote.internal. ClientNotifForwarder$ NotifFetcher. dispatchNotification(
ClientNotifForwarder.java:583)
at com.sun.jmx.remote.internal. ClientNotifForwarder$ NotifFetcher.doRun( ClientNotifForwarder.java:533)
at com.sun.jmx.remote.internal. ClientNotifForwarder$ NotifFetcher.run( ClientNotifForwarder.java:452)
at com.sun.jmx.remote.internal. ClientNotifForwarder$ LinearExecutor$1.run( ClientNotifForwarder.java:108)



FYI I am running repair from xx.xx.xx.91 node and its a 5 node cluster xx.xx.xx.91-xx.xx.xx.95

On Wed, Jun 28, 2017 at 5:16 PM, Akhil Mehra <akhilmehra@gmail.com<mailto:akhilmehra@gmail.com>>
wrote:
nodetool repair has a trace option

nodetool repair -tr yourkeyspacename

see if that provides you with additional information.

Regards,
Akhil

On 28/06/2017, at 2:25 AM, Balaji Venkatesan <venkatesan.balaji@gmail.com<mailto:venkatesan.balaji@gmail.com>>
wrote:


We use Apache Cassandra 3.10-13

On Jun 26, 2017 8:41 PM, "Michael Shuler" <michael@pbandjelly.org<mailto:michael@pbandjelly.org>>
wrote:
What version of Cassandra?

--
Michael

On 06/26/2017 09:53 PM, Balaji Venkatesan wrote:
> Hi All,
>
> When I run nodetool repair on a keyspace I constantly get  "Some repair
> failed" error, there are no sufficient info to debug more. Any help?
>
> Here is the stacktrace
>
> ============================== ============================== ==========
> [2017-06-27 02:44:34,275] Some repair failed
> [2017-06-27 02:44:34,279] Repair command #3 finished in 33 seconds
> error: Repair job has failed with the error message: [2017-06-27
> 02:44:34,275] Some repair failed
> -- StackTrace --
> java.lang.RuntimeException: Repair job has failed with the error
> message: [2017-06-27 02:44:34,275] Some repair failed
> at org.apache.cassandra.tools.Rep airRunner.progress(RepairRunne r.java:116)
> at
> org.apache.cassandra.utils.pro<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Forg.apache.cassandra.utils.pro%2F&data=02%7C01%7CAnubhav.Kale%40microsoft.com%7C9c76c05ad22a4bf4b9e808d4bf5f65f7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636343863837493978&sdata=oWSPWN9bKymZUnuIoLj9fL123AedPun%2FK3szcL5Pvbg%3D&reserved=0>
gress.jmx.JMXNotificationProgr essListener. handleNotification(JMXNotifica tionProgressListener.java:77)
> at
> com.sun.jmx.remote.internal.Cl<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcom.sun.jmx.remote.internal.cl%2F&data=02%7C01%7CAnubhav.Kale%40microsoft.com%7C9c76c05ad22a4bf4b9e808d4bf5f65f7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636343863837503980&sdata=0UObOV4mwPILytzikv5CtLVqKHSiq67jfkKIgja4hH8%3D&reserved=0>
ientNotifForwarder$NotifFetche r.dispatchNotification(ClientN otifForwarder.java:583)
> at
> com.sun.jmx.remote.internal.Cl<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcom.sun.jmx.remote.internal.cl%2F&data=02%7C01%7CAnubhav.Kale%40microsoft.com%7C9c76c05ad22a4bf4b9e808d4bf5f65f7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636343863837503980&sdata=0UObOV4mwPILytzikv5CtLVqKHSiq67jfkKIgja4hH8%3D&reserved=0>
ientNotifForwarder$NotifFetche r.doRun(ClientNotifForwarder. java:533)
> at
> com.sun.jmx.remote.internal.Cl<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcom.sun.jmx.remote.internal.cl%2F&data=02%7C01%7CAnubhav.Kale%40microsoft.com%7C9c76c05ad22a4bf4b9e808d4bf5f65f7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636343863837503980&sdata=0UObOV4mwPILytzikv5CtLVqKHSiq67jfkKIgja4hH8%3D&reserved=0>
ientNotifForwarder$NotifFetche r.run(ClientNotifForwarder. java:452)
> at
> com.sun.jmx.remote.internal.Cl<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcom.sun.jmx.remote.internal.cl%2F&data=02%7C01%7CAnubhav.Kale%40microsoft.com%7C9c76c05ad22a4bf4b9e808d4bf5f65f7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636343863837503980&sdata=0UObOV4mwPILytzikv5CtLVqKHSiq67jfkKIgja4hH8%3D&reserved=0>
ientNotifForwarder$LinearExecu tor$1.run(ClientNotifForwarder .java:108)
> ============================== ============================== ==========
>
>
> --
> Thanks,
> Balaji Venkatesan.

------------------------------ ------------------------------ ---------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apa che.org<mailto:user-unsubscribe@cassandra.apache.org>
For additional commands, e-mail: user-help@cassandra.apache.org<mailto:user-help@cassandra.apache.org>





--
Thanks,
Balaji Venkatesan.



--
Thanks,
Balaji Venkatesan.




--
Thanks,
Balaji Venkatesan.
Mime
View raw message