Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8076CD2D3 for ; Fri, 26 Oct 2012 18:56:50 +0000 (UTC) Received: (qmail 74203 invoked by uid 500); 26 Oct 2012 18:56:47 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 74159 invoked by uid 500); 26 Oct 2012 18:56:47 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 74151 invoked by uid 99); 26 Oct 2012 18:56:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Oct 2012 18:56:47 +0000 X-ASF-Spam-Status: No, hits=0.4 required=5.0 tests=NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 216.145.54.171 is neither permitted nor denied by domain of mlarsson@yahoo-inc.com) Received: from [216.145.54.171] (HELO mrout1.yahoo.com) (216.145.54.171) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Oct 2012 18:56:40 +0000 Received: from SP2-EX07CAS02.ds.corp.yahoo.com (sp2-ex07cas02.corp.sp2.yahoo.com [98.137.59.38]) by mrout1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id q9QIuAWd046702 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=FAIL) for ; Fri, 26 Oct 2012 11:56:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1351277771; bh=DGuEYoWoJIDAFubUs6nwqgTM5XcVPK96vp/itHrPFeQ=; h=From:To:Date:Subject:Message-ID:References:In-Reply-To: Content-Type:Content-Transfer-Encoding:MIME-Version; b=MJvn2AnPfJ3FKBKixlQvNj94jbLI1FSIaYGpXp6MIdpO8P4iYTDK7Un0kOSn/B2Hq o6VCXY/hOxu2uhf9yJfs9GQvCRyLrK3KRmUhQ1WXxEqvUE6P3fi3Noc0WL2fADYSEp HjlLSCopsflGBHw9H6cwW6vgnLDP2WIzo+zdLb7Y= Received: from SP2-EX07VS01.ds.corp.yahoo.com ([98.137.59.29]) by SP2-EX07CAS02.ds.corp.yahoo.com ([98.137.59.38]) with mapi; Fri, 26 Oct 2012 11:56:09 -0700 From: Mattias Larsson To: "user@cassandra.apache.org" Date: Fri, 26 Oct 2012 11:56:08 -0700 Subject: Re: Hinted Handoff storage inflation Thread-Topic: Hinted Handoff storage inflation Thread-Index: Ac2zq4+RmwE5+B2ZQHWaZvoGcOXreg== Message-ID: References: <1C115F46-280B-495B-B361-1D6EFE059842@yahoo-inc.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Milter-Version: master.31+4-gbc07cd5+ X-CLX-ID: 277771000 X-Virus-Checked: Checked by ClamAV on apache.org On Oct 24, 2012, at 6:05 PM, aaron morton wrote: > Hints store the columns, row key, KS name and CF id(s) for each mutation = to each node. Where an executed mutation will store the most recent columns= collated with others under the same row key. So depending on the type of m= utation hints will take up more space.=20 >=20 > The worse case would be lots of overwrites. After that writing a small am= ount of data to many rows would result in a lot of the serialised space bei= ng devoted to row keys, KS name and CF id. >=20 > 16Gb is a lot though. What was the write workload like ? Each write is new data only (no overwrites). Each mutation adds a row to on= e column family with a column containing about ~100 bytes of data and a new= row to another column family with a SuperColumn containing 2x17KiB payload= s. These are sent in batches with several in them, but I found that the sto= rage overhead was the same regardless of the size of the batch mutation (i.= e., 5 vs 25 mutations made no difference). A total of 1,000,000 mutations l= ike these are sent over the duration of the test. > You can get an estimate on the number of keys in the Hints CF using nodet= ool cfstats. Also some metrics in the JMX will tell you how many hints are = stored.=20 >=20 >> This has a huge impact on write performance as well. > Yup. Hints are added to the same Mutation thread pool as normal mutations= . They are processed async to the mutation request but they still take reso= urces to store.=20 >=20 > You can adjust how long hints a collected for with max_hint_window_in_ms = in the yaml file.=20 >=20 > How long did the test run for ?=20 >=20 With both data centers functional, the test takes just a few minutes to run= , with one data center down, 15x the amount of time. /dml