Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D917610578 for ; Sat, 7 Dec 2013 13:29:36 +0000 (UTC) Received: (qmail 75303 invoked by uid 500); 7 Dec 2013 13:29:33 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 75274 invoked by uid 500); 7 Dec 2013 13:29:27 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 75265 invoked by uid 99); 7 Dec 2013 13:29:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Dec 2013 13:29:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tom@drillster.com designates 209.85.128.53 as permitted sender) Received: from [209.85.128.53] (HELO mail-qe0-f53.google.com) (209.85.128.53) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Dec 2013 13:29:21 +0000 Received: by mail-qe0-f53.google.com with SMTP id nc12so1388260qeb.40 for ; Sat, 07 Dec 2013 05:29:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=kQ4qPwRlW0WvhNqp88pkQX0smRowpdZCXOT4E0c4BFQ=; b=MstqNNs+aNp84GBIjy8MhvLz9tYpTRct7cDNtJUDEMzHzt8tl9sHF9ZETy5yM+NUpq JtjLN/Mn37Mu9ArwDIaallgkqm+WuIItCvBcbgeee3OfnPqo+zF6+QAbo1q7Nqf2PECG UX0CXiRTJmXHXmnCLCjWNCAQYk8Et2wa7jo30vHaco0br463M14sUQ0jqWkWzRvHfYc1 KwLRZKXfDAR++Bv+27+hf9j4AWHiT8F4O6R2HJwVfIzC+BkSMQ+OQFQSQTPG1ijm0Zhd PGmXxis5XQ6nC1EX4aSMy184m6WEzzFaAxT8618YlxngN2PtMfqf3ygoQpgWAIUpIGXN GiGA== X-Gm-Message-State: ALoCoQnPRGCdtG7DmgM+y6E7R/oUmiOqBrfclts+CpQpTNJWbEZmiZn7BZpVromKNOxndAJrnNBz MIME-Version: 1.0 X-Received: by 10.229.126.9 with SMTP id a9mr15938891qcs.0.1386422939977; Sat, 07 Dec 2013 05:28:59 -0800 (PST) Received: by 10.140.24.177 with HTTP; Sat, 7 Dec 2013 05:28:59 -0800 (PST) In-Reply-To: References: Date: Sat, 7 Dec 2013 14:28:59 +0100 Message-ID: Subject: Re: How to monitor the progress of a HintedHandoff task? From: Tom van den Berge To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a1133d6eed8c21604ecf1be19 X-Virus-Checked: Checked by ClamAV on apache.org --001a1133d6eed8c21604ecf1be19 Content-Type: text/plain; charset=ISO-8859-1 Rahul, I've made some progress in my investigations in the mean time. It seems that the network bandwidth to my remote data center is relatively small, and at the same time my application generates far more write operations that I was expecting, resulting in more replication data to the remote DC. In the case of a network hickup, or a sudden peek in data generated by my application (or both), it seems that the network capacity to the remote DC is simply not sufficient to keep up with the data. This results in the hints piling up. On top of that, my cassandra nodes are equipped with a moderate amount of memory (4G). This might simply be not enough to keep maintain the hints and other column families in memtables. When the problem occurs, I can see that the node is very busy flushing the hint memtable to disk, which obviously results in high CPU/IO load. I've managed to significantly reduce the number of write/delete operations from my application, which should greatly decrease the rate at which the hints CF is growing in case of time outs to the remote DC. I'm also planning to stick some more memory in the servers. Can you think of other wise things I might have missed? Thanks for your feedback -- it's highly appreciated! Tom On Fri, Dec 6, 2013 at 4:41 PM, Rahul Menon wrote: > Tom, > > you should look at phi_convict_threshold and try and increase the value if > you have too much chatter on your network. > > Also, rebuilding the entire node because of a OOM does not make sense, > could you please post the C* version that you are using & the head size you > have configured? > > Thanks > Rahul > > > On Tue, Dec 3, 2013 at 7:41 PM, Tom van den Berge wrote: > >> Rahul, >> >> This problem occurs every now and then, and currently everything is ok, >> so there are no hints. But whenever it happens, the hints are quickly >> piling up. This results in heap problems on the node ("Heap is 0.813462 >> full..." appears many times). This in turn results in the flushing of the >> 'hints' column family, to relieve memory pressure. According to the log >> message, the size varies between 50 and 60MB). But since the >> HintedHandoffManager is reading from the hints CF, it will probably pull it >> back into a memtable again -- that's at least my understanding of how it >> works. >> >> So I guess that flushing the hints CF while the HintedHandoffManager is >> working on it only makes things worse, and it could be the reason that the >> process never ends. >> >> What I typically see when this happens is that the hints keep piling up, >> and eventually the node comes to a grinding halt (OOM). Then I have to >> rebuild the node entirely (only removing the hints doesn't work). >> >> The reason for hints to start accumulating in the first place might be a >> spike in CF writes that must be replicated to a node in another data >> center. The available bandwidth to that data center might not be able to >> handle the data quickly enough, resulting in stored hints. The >> HintedHandoff task that is started is targeting that remote node. >> >> >> Thanks, >> Tom >> >> >> On Tue, Dec 3, 2013 at 2:22 PM, Rahul Menon wrote: >> >>> Tom, >>> >>> Do you know why these hints are piling up? What is the size of the hints >>> cf? >>> >>> Thanks >>> Rahul >>> >>> >>> On Tue, Dec 3, 2013 at 6:41 PM, Tom van den Berge wrote: >>> >>>> Hi Rahul, >>>> >>>> Thanks for your reply. >>>> >>>> I have never seen message like "Timed out replaying hints to...", which >>>> is a good thing then, I suppose ;) >>>> >>>> Normally, I do see the "Finished hinted handoff..." log message. >>>> However, every now and then this message is not logged, not even after >>>> several hours. This is the problem I'm trying to solve. >>>> >>>> The log messages you describe are quite course-grained; they only tell >>>> you that a task has started or finished, but not how this task is >>>> progressing. And that's exactly what I would like to know if I see that a >>>> task has started, but has not finished after a reasonable amount of time. >>>> >>>> So I guess the only way to see learn the progress is to look inside the >>>> 'hints' column family then.I'll give that a try. >>>> >>>> >>>> Thanks, >>>> Tom >>>> >>>> >>>> On Tue, Dec 3, 2013 at 1:43 PM, Rahul Menon wrote: >>>> >>>>> Tom, >>>>> >>>>> You should check the size of the hints column family to determine how >>>>> much are present. The hints are a super column family and its keys are >>>>> destination tokens. You could look at it if you would like. >>>>> >>>>> Hints send and timedouts are logged, you should be seeing something >>>>> like >>>>> >>>>> Timed out replaying hints to {}; aborting ({} delivered >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> OR >>>>> >>>>> Finished hinted handoff of {} rows to endpoint {} >>>>> >>>>> >>>>> >>>>> Thanks >>>>> Rahul >>>>> >>>>> >>>>> On Tue, Dec 3, 2013 at 2:36 PM, Tom van den Berge wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Is there a way to monitor the progress of a hinted handoff task? >>>>>> >>>>>> I found the following two mbeans providing some info: >>>>>> >>>>>> org.apache.cassandra.internal:type=HintedHandoff, which tells me that >>>>>> there is 1 active task, and >>>>>> org.apache.cassandra.db:type=HintedHandoffManager#countPendingHints(), >>>>>> which quite often gives a timeout when executed. >>>>>> >>>>>> Ideally, I would like to see how many hints have been sent (e.g. over >>>>>> the last minute or so), and how many hints are still to be sent (although I >>>>>> assume that's what countPendingHints normally does?) >>>>>> >>>>>> I'm experiencing hinted handoff tasks that are started, but never >>>>>> finish, so I would like to know what the task is doing. >>>>>> >>>>>> My log shows this: >>>>>> >>>>>> INFO [HintedHandoff:1] 2013-12-02 >>>>>> 13:49:05,325 HintedHandOffManager.java (line 297) Started hinted handoff >>>>>> for host: 6f80b942-5b6d-4233-9827-3727591abf55 with IP: /10.55.156.66 >>>>>> (nothing more for [HintedHandoff:1]) >>>>>> >>>>>> The node is up and running, the network connection is ok, no gossip >>>>>> messages appear in the logs. >>>>>> >>>>>> Any idea is welcome. >>>>>> (Casandra 1.2.3) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Drillster BV >>>>>> Middenburcht 136 >>>>>> 3452MT Vleuten >>>>>> Netherlands >>>>>> >>>>>> +31 30 755 5330 >>>>>> >>>>>> Open your free account at www.drillster.com >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> Drillster BV >>>> Middenburcht 136 >>>> 3452MT Vleuten >>>> Netherlands >>>> >>>> +31 30 755 5330 >>>> >>>> Open your free account at www.drillster.com >>>> >>> >>> >> >> >> -- >> >> Drillster BV >> Middenburcht 136 >> 3452MT Vleuten >> Netherlands >> >> +31 30 755 5330 >> >> Open your free account at www.drillster.com >> > > -- Drillster BV Middenburcht 136 3452MT Vleuten Netherlands +31 30 755 5330 Open your free account at www.drillster.com --001a1133d6eed8c21604ecf1be19 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Rahul,

I've made some progress in m= y investigations in the mean time. It seems that the network bandwidth to m= y remote data center is relatively small, and at the same time my applicati= on generates far more write operations that I was expecting, resulting in m= ore replication data to the remote DC.

In the case of a network hickup, or a =A0sudden peek in= data generated by my application (or both), it seems that the network capa= city to the remote DC is simply not sufficient to keep up with the data. Th= is results in the hints piling up.

On top of that, my cassandra nodes are equipped with a = moderate amount of memory (4G). This might simply be not enough to keep mai= ntain the hints and other column families in memtables. When the problem oc= curs, I can see that the node is very busy flushing the hint memtable to di= sk, which obviously results in high CPU/IO load.

I've managed to significantly reduce the number of = write/delete operations from my application, which should greatly decrease = the rate at which the hints CF is growing in case of time outs to the remot= e DC. I'm also planning to stick some more memory in the servers. Can y= ou think of other wise things I might have missed?

Thanks for your feedback -- it's highly appreciated= !

Tom

On Fri, Dec 6, 2013 at 4:41 PM, Rahul Menon <r= ahul@apigee.com> wrote:
Tom,

you should look at phi_convict_threshold and try and increase the value if = you have too much chatter on your network.

Also, rebuilding the entire node because of a OOM does not make sense,= could you please post the C* version that you are using & the head siz= e you have configured?

Thanks
Rahul
=


On Tue, Dec 3, 2013 at 7:41 PM, Tom van = den Berge <tom@drillster.com> wrote:
Rahul,

This problem occurs every now an= d then, and currently everything is ok, so there are no hints. But whenever= it happens, the hints are quickly piling up. This results in heap problems= on the node ("Heap is 0.813462 full..." appears many times). Thi= s in turn results in the flushing of the 'hints' column family, to = relieve memory pressure.=A0According to the log message, the size varies be= tween 50 and 60MB).=A0But since the HintedHandoffManager is reading from th= e hints CF, it will probably pull it back into a memtable again -- that'= ;s at least my understanding of how it works.=A0

So I guess that flushing the hints CF while the HintedH= andoffManager is working on it only makes things worse, and it could be the= reason that the process never ends.

What I typica= lly see when this happens is that the hints keep piling up, and eventually = the node comes to a grinding halt (OOM). Then I have to rebuild the node en= tirely (only removing the hints doesn't work).

The reason for hints to start accumulating in the first= place might be a spike in CF writes that must be replicated to a node in a= nother data center. The available bandwidth to that data center might not b= e able to handle the data quickly enough, resulting in stored hints. The Hi= ntedHandoff task that is started is targeting that remote node.


Thanks,
Tom


On Tue, De= c 3, 2013 at 2:22 PM, Rahul Menon <rahul@apigee.com> wrote:
Tom,

Do you know why these hints are piling up? What is the size of the hints cf= ?

Thanks
Rahul


On Tue, Dec 3, 2013 at 6:41 PM, Tom van = den Berge <tom@drillster.com> wrote:
Hi Rahul,

Thanks for your reply.
<= div>
I have never seen message like "Timed out replaying= hints to...", which is a good thing then, I suppose ;)

Normally, I do see the "Finished hinted handoff..." lo= g message. However, every now and then this message is not logged, not even= after several hours. This is the problem I'm trying to solve.

The log messages you describe are quite course-grained;= they only tell you that a task has started or finished, but not how this t= ask is progressing. And that's exactly what I would like to know if I s= ee that a task has started, but has not finished after a reasonable amount = of time.

So I guess the only way to see learn the progress is to= look inside the 'hints' column family then.I'll give that a tr= y.


Thanks,
Tom


On Tue, Dec 3= , 2013 at 1:43 PM, Rahul Menon <rahul@apigee.com> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">
Tom,

You should check the size of the hints column fam= ily to determine how much are present. The hints are a super column family = and its keys are destination tokens. You could look at it if you would like= .

Hints send and timedouts are logged, you should be seeing something li= ke

Timed=A0out=A0replaying=A0hints=A0to=A0{};=A0aborting=
=A0({}=A0delivered
OR <=
/span>

Finished=A0hinted=A0handoff=A0of=A0{}=A0rows=A0t= o=A0endpoint=A0{}


Thanks
Rahul


<= div class=3D"gmail_quote"> On Tue, Dec 3, 2013 at 2:36 PM, Tom van den Berge <tom@drillster.com&g= t; wrote:
Hi,

Is there a way to monitor the progress of a hinted handoff task?

I found the following two mbeans providing some info:

org.apache.cassandra.internal:type=3DHintedHandoff,= which tells me that there is 1 active task, and
org.apache.cassandra.db:type=3DHintedHandoffManager#countPending= Hints(), which quite often gives a timeout when executed.
Ideally, I would like to see how many hints have been sent (e.g= . over the last minute or so), and how many hints are still to be sent (alt= hough I assume that's what countPendingHints normally does?)

I'm experiencing hinted handoff tasks that are star= ted, but never finish, so I would like to know what the task is doing.

My log shows this:

INFO [Hint= edHandoff:1] 2013-12-02 13:49:05,325=A0HintedHandOffManager.java (line 297)= Started hinted handoff for host: 6f80b942-5b6d-4233-9827-3727591abf55 with= IP: /10.55.156.66
(nothing more for [HintedHandoff:1])

The node= is up and running, the network connection is ok, no gossip messages appear= in the logs.

Any idea is welcome.
(Casandra 1.2.3)




--

Drillster BV
Middenburcht 136
3452MT Vleuten
N= etherlands

+31 30 755 5330

Open your free = account at=A0www.drillster.com





--

Drillster BV
Middenburcht 136
3452MT Vleuten
Nethe= rlands

+31 30 755 5330

Open your free account at=A0www.drillster.com





--
=

Drillster BV
Middenburcht 136
3452MT Vleu= ten
Netherlands

+31 30 755 5330

Open your fr= ee account at=A0www.drillster.com





--
=

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherland= s

+31 30 755 5330

Open your fr= ee account at=A0www.drillster.com

--001a1133d6eed8c21604ecf1be19--