incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Potekhin <potek...@bnl.gov>
Subject Re: Repair failure under 0.8.6
Date Sun, 04 Dec 2011 21:12:53 GMT
Thanks Peter!

I will try to increase phi_convict -- I will just need to restart the 
cluster after
the edit, right?

I do recall that I see nodes temporarily marked as down, only to pop up 
later.

In the current situation, there is no load on the cluster at all, 
outside the
maintenance like the repair.

How do I configure the print level for the GC report?

Thank you,
Maxim


On 12/4/2011 2:09 PM, Peter Schuller wrote:
>> I capped heap and the error is still there. So I keep seeing "node dead"
>> messages even when I know the nodes were OK. Where and how do I tweak
>> timeouts?
> You can increase phi_convict_threshold in the configuration. However,
> I would rather want to find out why they are being marked as down to
> begin with. In a healthy situation, especially if you are not putting
> extreme load on the cluster, there is very little reason for hosts to
> be marked as down unless there's some bug somewhere.
>
> Is this cluster under constant traffic? Are you seeing slow requests
> from the point of view of the client (indicating that some requests
> are routed to nodes that are temporarily inaccessible)?
>
> With respect to GC, I would recommend running with -XX:+PrintGC and
> -XX:PrintGCDetails and -XX:+PrintGCTimeStamps and
> -XX:+PrintGCDateStamps and then look at the system log. A fallback to
> full GC should be findable by grepping for "Full".
>
> Also, is this a problem with one specific host, or is it happening to
> all hosts every now and then? And I mean either the host being flagged
> as down, or the host that is flagging others as down.
>
> As for uncapped heap: Generally a larger heap is not going to make it
> more likely to fall back to full GC; usually the opposite is true.
> However, a larger heap can make some of the non-full GC pauses longer,
> depending. In either case, r unning with the above GC options will
> give you specific information on GC pauses and should allow you to
> rule that out (or not).
>


Mime
View raw message