cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: Repair giving error
Date Mon, 22 Jan 2018 12:55:56 GMT
By the way,

As you plan to use (or are using) incremental repairs, be aware that there
are some downsides of doing so. You might want to read this post from my
colleague Alexander:
http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html
.

My judgment is probably somewhat biased, but I believe it is worth having a
look ;-).

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-01-22 12:34 GMT+00:00 Alain RODRIGUEZ <arodrime@gmail.com>:

> Hello,
>
> Some other thoughts:
>
> - Are you using internode secured communications (and then use the port
> 7001 instead) ?
> - A rolling restart might help, have you tried restarting a few / all the
> nodes?
>
> This issue is very weird and I am only making poor guesses here. This is
> not an issue I have seen in the past, thus It might help to see the raw
> outputs (nodetool status <keyspace>, keyspace replication strategy, WARN or
> ERR logs,...) and also to have the command you are running.
> Also, if there have been operations ran on this cluster recently that
> might have trigger this (RF change, Snitch change, new DC, ... or any other
> major change). it's good we know about history to have a feel of what the
> cluster state can be currently.
> Did this same command use to run and now fails or are repairs it something
> you are trying to add and that never worked so far?
>
> Some context might help us to help you :-),
>
> C*heers,
> -----------------------
> Alain Rodriguez - @arodream - alain@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2018-01-18 19:00 GMT+00:00 Akshit Jain <akshit13124@iiitd.ac.in>:
>
>> Hi alain
>> Thanks for the response.
>> I'm using cassandra 3.10
>> nodetool status <keyspace> shows all the nodes up
>> No schema disaggrement
>> port 7000 is open
>>
>> Regards
>> Akshit Jain
>> 9891724697
>>
>> On Thu, Jan 18, 2018 at 4:53 PM, Alain RODRIGUEZ <arodrime@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I looks like a communication issue.
>>>
>>> What Cassandra version are you using?
>>> What's the result of 'nodetool status <keyspace>'?
>>> Any schema disagreement 'nodetool describecluster'?
>>> Is the port 7000 opened and the nodes communicating with each
>>> other?(Ping is not proving connection is up, even though it is good to know
>>> the machine is there and up :)).
>>> Any other errors you could see in the logs?
>>>
>>> You might want to consider this an open source project my coworkers have
>>> been working on (and are maintaining) called reaper that aims at making
>>> repairs more efficient and easy to manage as repair is one of the most
>>> tricky operation to handle for a Cassandra operator:
>>> http://cassandra-reaper.io/. I did not work on this project directly
>>> but we have good feedbacks and like this tool ourselves.
>>>
>>> C*heers,
>>> -----------------------
>>> Alain Rodriguez - @arodream - alain@thelastpickle.com
>>> France / Spain
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>>
>>>
>>>
>>> 2018-01-14 7:47 GMT+00:00 Akshit Jain <akshit13124@iiitd.ac.in>:
>>>
>>>> ​I have a 10 node C* cluster with 4-5 keyspaces​.
>>>> I tried to perform nodetool repair one by one for each keyspace.
>>>> For some keyspaces the repair passed but for some it gave this error:
>>>> ​
>>>> I am not able to figure out what is causing this issue.The replica
>>>> nodes are up and I am able to ping them from this node.​
>>>> ​Any suggestions?​
>>>>
>>>> *Error I am getting on incremental repair:*
>>>>
>>>> *[2018-01-10 12:50:14,047] Did not get positive replies from all
>>>> endpoints. List of failed endpoint(s): [​a.b.c.d, ​e.f.g.h]*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *-- StackTrace --java.lang.RuntimeException: Repair job has failed with
>>>> the error message: [2018-01-10 12:50:14,047] Did not get positive replies
>>>> from all endpoints. List of failed endpoint(s): [​a.b.c.d, ​e.f.g.h]at
>>>> org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115)at
>>>> org.apache.cassandra.utils.pro
>>>> <http://org.apache.cassandra.utils.pro>gress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)at
>>>> com.sun.jmx.remote.internal.Cl
>>>> <http://com.sun.jmx.remote.internal.Cl>ientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)at
>>>> com.sun.jmx.remote.internal.Cl
>>>> <http://com.sun.jmx.remote.internal.Cl>ientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)at
>>>> com.sun.jmx.remote.internal.Cl
>>>> <http://com.sun.jmx.remote.internal.Cl>ientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)at
>>>> com.sun.jmx.remote.internal.Cl
>>>> <http://com.sun.jmx.remote.internal.Cl>ientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108)*
>>>>
>>>
>>>
>>
>

Mime
View raw message