incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Chunlu <springri...@gmail.com>
Subject Re: Corrupted data
Date Mon, 11 Jul 2011 02:31:04 GMT
oh the error seems from jmx


sorry but seems I dont have more error messages, the node repair just never
ends... and strace the process find out nothing, it is not doing anything.

is there anyway to get more information about this?  do I need to do a major
compaction on every column family? thanks!

On Mon, Jul 11, 2011 at 1:36 AM, aaron morton <aaron@thelastpickle.com>wrote:

> 1) do I need to treat every node as failure and do a rolling replacement?
>  since there might be some inconsistent in the cluster even I have no way to
> find out.
>
> see
> http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds
>
>
> <http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds>
>
> 2) is that the reason that caused the node repair hung? the log message
> says:
> Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException:
> Read timed out
>
> I cannot find that anywhere in the code base, can you provide some more
> information ?
>
> Cheers
>
>  -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 10 Jul 2011, at 03:26, Yan Chunlu wrote:
>
> I am running RF=2(I have changed it from 2->3 and back to 2) and 3 nodes
> and didn't running node repair more than 10 days, did not aware of this is
> critical.  I run node repair recently and one of the node always hung...
> from log it seems doing nothing related to the repair.
>
> so I got two problems:
>
> 1) do I need to treat every node as failure and do a rolling replacement?
>  since there might be some inconsistent in the cluster even I have no way to
> find out.
> 2) is that the reason that caused the node repair hung? the log message
> says:
> Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException:
> Read timed out
>
> then nothing.
>
> thanks!
>
> On Sat, Jul 9, 2011 at 10:16 PM, Peter Schuller <
> peter.schuller@infidyne.com> wrote:
>
>> >> - Have you been running repair consistently ?
>> >
>> > Nop, only when something breaks
>>
>> This is unrelated to the problem you were asking about, but if you
>> never run delete, make sure you are aware of:
>>
>> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
>> http://wiki.apache.org/cassandra/DistributedDeletes
>>
>>
>> --
>> / Peter Schuller
>>
>
>
>
> --
> 闫春路
>
>
>


-- 
Charles

Mime
View raw message