Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 58C2391F8 for ; Mon, 18 Mar 2013 16:52:05 +0000 (UTC) Received: (qmail 25937 invoked by uid 500); 18 Mar 2013 16:52:02 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 25902 invoked by uid 500); 18 Mar 2013 16:52:02 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 25894 invoked by uid 99); 18 Mar 2013 16:52:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Mar 2013 16:52:02 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a80.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Mar 2013 16:51:58 +0000 Received: from homiemail-a80.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a80.g.dreamhost.com (Postfix) with ESMTP id 7AC5937A06F for ; Mon, 18 Mar 2013 09:51:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=34SqnlCWi+h9aOxgEn6Ykch8/x 8=; b=bSJVBvm9VQ7J+XE00Y2S78/0NdmCZtT+5Dj+RfYzw/tKWtgIORgdvat6VR t72eLr+LdEGIMZR+58acDexLjYr+ksHEukS4JnCE765xykHSKvz1vE/2lrwlYH8q 2k6hMsHjd8J1vrf7DvMAr4Yl/eG6OTM/VbfMjoJqWu7g2j52g= Received: from [172.16.1.8] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a80.g.dreamhost.com (Postfix) with ESMTPSA id CAD9737A065 for ; Mon, 18 Mar 2013 09:51:36 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_E75D5433-B273-4146-A1CC-D3DF06CE5D6B" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: 33million hinted handoffs from nowhere Date: Tue, 19 Mar 2013 05:51:35 +1300 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_E75D5433-B273-4146-A1CC-D3DF06CE5D6B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 You can check which nodes hints are being held for using the JMX api. = Look for the org.apache.cassandra.db:type=3DHintedHandoffManager MBean = and call the listEndpointsPendingHints() function.=20 There are two points where hints may be stored, if the node is down when = the request started or if the node timed out and did not return before = rpc_timeout. To check for the first, look for log lines about a node = being "dead" on the coordinator. To check for the second look for = dropped messages on the other nodes. This will be logged, or you can use = nodetool tpstats to look for them. Cheers =20 ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 15/03/2013, at 2:30 AM, Andras Szerdahelyi = wrote: > ( The previous letter was sent prematurely, sorry. ) >=20 > This node is the only node being written to, but the Cfs being written = replicate to almost all of the other nodes > My understanding is that hinted handoff is mutations kept around on = the coordinator node, to be replayed when the target node re-appears on = the ring. All my nodes are up and again, no hinted handoff is logged on = the node itself >=20 > Thanks! > Andras >=20 > From: Andras Szerdahelyi > Date: Thursday 14 March 2013 14:25 > To: "user@cassandra.apache.org" > Subject: 33million hinted handoffs from nowhere >=20 > Hi list, >=20 > I am experiencing seemingly uncontrollable and unexplained growth of = my HintedHandoff CF on a single node. Unexplained because there are no = hinted handoffs being logged on the node, uncontrollable because I see = 33 million inserts in cfstats and the size of the stables is over 10 = gigs all in an hour of uptime.=20 >=20 >=20 > I have done the following to try and reproduce this: >=20 > - shut down my cluster > - on all nodes: remove sstables from the HintsColumnFamily data dir > - on all nodes: remove commit logs > - start all nodes but the one that=92s showing this problem > - nothing is writing to any of the nodes. There are no hinted handoff = going on anywhere > - bring back the node in question last > - few seconds after boot: >=20 > Column Family: HintsColumnFamily > SSTable count: 1 > Space used (live): 44946532 > Space used (total): 44946532 > Number of Keys (estimate): 256 > Memtable Columns Count: 17840 > Memtable Data Size: 17569909 > Memtable Switch Count: 2 > Read Count: 0 > Read Latency: NaN ms. > Write Count: 184836 > Write Latency: 0.668 ms. > Pending Tasks: 0 > Bloom Filter False Postives: 0 > Bloom Filter False Ratio: 0.00000 > Bloom Filter Space Used: 16 > Compacted row minimum size: 20924301 > Compacted row maximum size: 25109160 > Compacted row mean size: 25109160 >=20 >=20 >=20 >=20 --Apple-Mail=_E75D5433-B273-4146-A1CC-D3DF06CE5D6B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 You = can check which nodes hints are being held for using the JMX api. Look = for the org.apache.cassandra.db:type=3DHintedHandoffManager MBean = and call the listEndpointsPendingHints() = function. 

There are two points where hints may = be stored, if the node is down when the request started or if the node = timed out and did not return before rpc_timeout. To check for the first, = look for log lines about a node being "dead" on the coordinator. To = check for the second look for dropped messages on the other nodes. This = will be logged, or you can use nodetool tpstats to look for = them.

Cheers
  
http://www.thelastpickle.com

On 15/03/2013, at 2:30 AM, Andras Szerdahelyi <andras.szerdahelyi@igni= tionone.com> wrote:

( The previous letter was sent prematurely, sorry. )

This node is the only node being written to, but the Cfs being = written replicate to almost all of the other nodes
My understanding is that hinted handoff is mutations kept around on = the coordinator node, to be replayed when the target node re-appears on = the ring. All my nodes are up and again, no hinted handoff is logged on = the node itself

Thanks!
Andras

From: Andras Szerdahelyi <andras.szerdahelyi@igni= tionone.com>
Date: Thursday 14 March 2013 = 14:25
To: "user@cassandra.apache.org" = <user@cassandra.apache.org>= ;
Subject: 33million hinted = handoffs from nowhere

Hi list,

I am experiencing seemingly uncontrollable and unexplained growth = of my HintedHandoff CF on a single node. Unexplained because there are = no hinted handoffs being logged on the node, uncontrollable because I = see 33 million inserts in cfstats and the size of the stables is over 10 gigs all in an hour of uptime. 


I have done the following to try and reproduce this:

- shut down my cluster
- on all nodes: remove sstables from = the HintsColumnFamily data dir
- on all nodes: remove commit logs
- start all nodes but the one that=92s showing this problem
- nothing is writing to any of the nodes. There are no hinted = handoff going on anywhere
- bring back the node in question last
- few seconds after boot:

                Column = Family: HintsColumnFamily
                SSTable = count: 1
                Space used = (live): 44946532
                Space used = (total): 44946532
                Number of = Keys (estimate): 256
                Memtable = Columns Count: 17840
                Memtable = Data Size: 17569909
                Memtable = Switch Count: 2
                Read Count: = 0
                Read = Latency: NaN ms.
                Write = Count: 184836
                Write = Latency: 0.668 ms.
                Pending = Tasks: 0
                Bloom = Filter False Postives: 0
                Bloom = Filter False Ratio: 0.00000
                Bloom = Filter Space Used: 16
                Compacted = row minimum size: 20924301
                Compacted = row maximum size: 25109160
                Compacted = row mean size: 25109160





= --Apple-Mail=_E75D5433-B273-4146-A1CC-D3DF06CE5D6B--