cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Potekhin <potek...@bnl.gov>
Subject Re: Repair failure under 0.8.6
Date Sun, 04 Dec 2011 21:15:55 GMT
Please disregard the GC part of the question -- I found it.

On 12/4/2011 4:12 PM, Maxim Potekhin wrote:
> Thanks Peter!
>
> I will try to increase phi_convict -- I will just need to restart the 
> cluster after
> the edit, right?
>
> I do recall that I see nodes temporarily marked as down, only to pop 
> up later.
>
> In the current situation, there is no load on the cluster at all, 
> outside the
> maintenance like the repair.
>
> How do I configure the print level for the GC report?
>
> Thank you,
> Maxim
>
>
> On 12/4/2011 2:09 PM, Peter Schuller wrote:
>>> I capped heap and the error is still there. So I keep seeing "node 
>>> dead"
>>> messages even when I know the nodes were OK. Where and how do I tweak
>>> timeouts?
>> You can increase phi_convict_threshold in the configuration. However,
>> I would rather want to find out why they are being marked as down to
>> begin with. In a healthy situation, especially if you are not putting
>> extreme load on the cluster, there is very little reason for hosts to
>> be marked as down unless there's some bug somewhere.
>>
>> Is this cluster under constant traffic? Are you seeing slow requests
>> from the point of view of the client (indicating that some requests
>> are routed to nodes that are temporarily inaccessible)?
>>
>> With respect to GC, I would recommend running with -XX:+PrintGC and
>> -XX:PrintGCDetails and -XX:+PrintGCTimeStamps and
>> -XX:+PrintGCDateStamps and then look at the system log. A fallback to
>> full GC should be findable by grepping for "Full".
>>
>> Also, is this a problem with one specific host, or is it happening to
>> all hosts every now and then? And I mean either the host being flagged
>> as down, or the host that is flagging others as down.
>>
>> As for uncapped heap: Generally a larger heap is not going to make it
>> more likely to fall back to full GC; usually the opposite is true.
>> However, a larger heap can make some of the non-full GC pauses longer,
>> depending. In either case, r unning with the above GC options will
>> give you specific information on GC pauses and should allow you to
>> rule that out (or not).
>>


Mime
View raw message