cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Potekhin <potek...@bnl.gov>
Subject Re: Repair failure under 0.8.6
Date Sun, 04 Dec 2011 23:51:31 GMT
As a side effect of the failed repair (so it seems) the disk usage on the
affected node prevents compaction from working. It still works on
the remaining nodes (we have 3 total).
Is there a way to scrub the extraneous data?

Thanks

Maxim


On 12/4/2011 4:29 PM, Peter Schuller wrote:

>> I will try to increase phi_convict -- I will just need to restart the
>> cluster after
>> the edit, right?
> You will need to restart the nodes for which you want the phi convict
> threshold to be different. You might want to do on e.g. half of the
> cluster to do A/B testing.
>
>> I do recall that I see nodes temporarily marked as down, only to pop up
>> later.
> I recommend grepping through the logs on all the clusters (e.g., cat
> /var/log/cassandra/cassandra.log | grep UP | wc -l). That should tell
> you quickly whether they all seem to be seeing roughly as many node
> flaps, or whether some particular node or set of nodes is/are
> over-represented.
>
> Next, look at the actual nodes flapping (remove wc -l) and see if all
> nodes are flapping or if it is a single node, or a subset of the nodes
> (e.g., sharing a switch perhaps).
>
>> In the current situation, there is no load on the cluster at all, outside
>> the
>> maintenance like the repair.
> Ok. So what i'm getting at then is that there may be real legitimate
> connectivity problems that you aren't noticing in any other way since
> you don't have active traffic to the cluster.
>
>


Mime
View raw message