Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3ED016B61 for ; Fri, 20 May 2011 02:14:19 +0000 (UTC) Received: (qmail 92718 invoked by uid 500); 20 May 2011 02:14:16 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 92682 invoked by uid 500); 20 May 2011 02:14:16 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 92673 invoked by uid 99); 20 May 2011 02:14:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 May 2011 02:14:16 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a78.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 May 2011 02:14:10 +0000 Received: from homiemail-a78.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a78.g.dreamhost.com (Postfix) with ESMTP id AF36F15C058 for ; Thu, 19 May 2011 19:13:44 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=VnW2oc7jXG gvM+p7RUYUzuiKtlnGKGYAEqRE+Rz0uhvB8b9FIMamRUFzQvrwOkPySdnGZfpB1j P5gDW6Nc0zbwj6uupO/HTR5P43ZWAMk5MqX/Ua+06M69JFbo+HklxrKm708uVoFZ evvB+weZLNpYTv/yBre0KzNh6Exoa0odw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=F1wz/KHpsV4OTZsC MsmFfCN8HqM=; b=ZNJEOTBo6vYdl3KAAefHLvK8i6NpGcclWQRSEbZ6MRjWYFlu ZKE6dpgz1NtBJ3CfemxfjbDmOOW+kcr4RQeLpg1A9eAujUWzPbBBfKrdD7O1Fs9B L9meQ4BbGsqv+17jiCR+5onkzLUqXFA7SZUL/maSh8FNm+H0NRa7AOhrHts= Received: from [10.0.1.151] (121-73-157-230.cable.telstraclear.net [121.73.157.230]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a78.g.dreamhost.com (Postfix) with ESMTPSA id E085B15C056 for ; Thu, 19 May 2011 19:13:43 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: multipart/alternative; boundary=Apple-Mail-7--1014578216 Subject: Re: Knowing when there is a *real* need to add nodes Date: Fri, 20 May 2011 14:13:39 +1200 In-Reply-To: To: user@cassandra.apache.org References: <3A7C4477-6249-43D6-A5DF-81D4AFA4724B@gmail.com> Message-Id: X-Mailer: Apple Mail (2.1084) --Apple-Mail-7--1014578216 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Considering disk usage is a tricky one. Compacted SSTables files will = remain on disk until either there is not enough space, or the JVM GC = runs. To measure the live space use the "Space used (live)" from the = CFStats. "Space used (total)" includes the space which has been = compacted and not yet deleted from disk.=20 The data in deleted columns *may* be purged from disk during a minor or = major compaction. This can happen before GCGraceSeconds has expired. It = is only the Tombstone that must be kept around for at least = GCGraceSeconds.=20 I agree that 50% utilisation on the data directories is a sensible soft = limit that will help keep you out of trouble. The space needed by the = compaction depends on which bucket of files it is compacting, but it = will always require at least as much free disk space as the files it is = compacting. That should also leave headroom for adding new nodes, just = in case. Ideally when adding new nodes existing nodes only stream data = to the new nodes. If however you are increasing the node count by less = than a factor of 2 you may need to make multiple moves and the nodes may = need additional space. =20 To gauge the throughout I would also look at the Latency trackers on the = o.a.c.db.StorageProxy MBean. They track the latency of complete requests = including talking to the rest of the cluster. The metrics on the = individual column families are concerned with the local read.=20 For the pending TP stats I would guess that for the read and write pools = a pending value consistently higher than the number of threads assigned = (in the config) would be something to investigate. Waiting on these = stages will be reflected in the StorageProxy latency numbers. = HintedHandoff, StreamStage and AntiEntropyStage will have tasks that = staying the pending queue for a while. AFAIK the other pools should not = have many (< 10) tasks in the pending queue and should be able to = clearing the pending queue. =20 Hope that helps.=20 =20 ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 18 May 2011, at 19:50, Tomer B wrote: > As for static disk usage i would add this: >=20 > test: df -kh > description: run test after compaction (check GCGraceSeconds in = storage-conf.xml) as only then data is expunged permanently, run on data = disk, assuming here commitlog disk is separated from data dir. > green gauge: used_space < 30% of disk capacity > yellow gauge: used space 30% - 50% of disk capacity > red gauge: used_space > 50% of disk capacity > comments: Compactions can require up to 100% of in use space = temporarily in worst case (data file dir) when approaching 50% or more = of disk capacity use raid0 for data dir disk if cannot try increasing = your disk if cannot consider adding nodes (or first consider adding = nodes if that's what you wish). >=20 > 2011/5/12 Watanabe Maki > It's interesting topic for me too. > How about to add measurement on static disk utilization (% used) and = memory utilization ( rss, JVM heap, JVM GC )? >=20 > maki >=20 > =46rom iPhone >=20 >=20 > On 2011/05/12, at 0:49, Tomer B wrote: > =20 > > Hi > > > > I'm trying to predict when my cluster would soon be needing new = nodes > > added, i want a continuous graph telling my of my cluster health so > > that when i see my cluster becomes more and more busy (I want = numbers > > & measurments) i would be able to know i need to start purchasing = more > > machines and get them into my cluster, so i want to know of that > > beforehand. > > I'm writing here what I came with after doing some research over = net. > > I would highly appreciate any additional gauge measurements and = ranges > > in order to test my cluster health and to know beforehand when i'm > > going to soon need more nodes.Although i'm writing down green > > gauge,yellow gauge,red gauge, i'm also trying to find a continuous > > graph where i can tell where our cluster stand (as much as > > possible...) > > > > Also my recommendation is always before adding new nodes: > > > > 1. Make sure all nodes are balanced and if not balance them. > > 2. Separate commit log drive from data (SSTables) drive > > 3. use mmap index only in memory and not auto > > 4. Increase disk IO if possible. > > 5. Avoid swapping as much as possible. > > > > > > As for my gauge tests for when to add new nodes: > > > > test: nodetool tpstats -h > > green gauge: No pending column with number higher > > yellow gauge: pending columns 100-2000 > > red gauge:Larger than 3000 > > > > test: iostat -x -n -p -z 5 10 and iostat -xcn 5 > > green gauge: kw/s + kr/s reaches is below 25% capacity of disk io > > yellow gauge: 20%-50% > > red gauge: 50%+ > > > > test: ostat -x -n -p -z 5 10 and check %b column > > green gauge: less than 10% > > yellow gauge: 10%-80% > > red gauge: 90%+ > > > > test: nodetool cfstats --host localhost > > green gauge: =93SSTable count=94 item does not continually grow over = time > > yellow gauge: > > red gauge: =93SSTable count=94 item continually grows over time > > > > test: ./nodetool cfstats --host localhost | grep -i pending > > green gauge: 0-2 > > yellow gauge: 3-100 > > red gauge: 101+ > > > > I would highly appreciate any additional gauge measurements and = ranges > > in order to test my cluster health and to know ***beforehand*** when > > i'm going to soon need more nodes. >=20 --Apple-Mail-7--1014578216 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252
The = data in deleted columns *may* be purged from disk during a minor or = major compaction. This can happen before GCGraceSeconds has expired. It = is only the Tombstone that must be kept around for at least = GCGraceSeconds. 

I agree that 50% = utilisation on the data directories is a sensible soft limit that will = help keep you out of trouble. The space needed by the compaction depends = on which bucket of files it is compacting,  but it will always = require at least as much free disk space as the files it is = compacting. That should also leave headroom for adding new nodes, = just in case. Ideally when adding new nodes existing nodes only stream = data to the new nodes. If however you are increasing the node count by = less than a factor of 2 you may need to make multiple moves and the = nodes may need additional space. =   

To gauge the throughout I would = also look at the Latency trackers on the o.a.c.db.StorageProxy MBean. = They track the latency of complete requests including talking to the = rest of the cluster. The metrics on the individual column families are = concerned with the local read. 

For the = pending TP stats I would guess that for the read and write pools a = pending value consistently higher than the number of threads assigned = (in the config) would be something to investigate. Waiting on these = stages will be reflected in the StorageProxy latency numbers. =  HintedHandoff, StreamStage and AntiEntropyStage will have tasks = that staying the pending queue for a while. AFAIK the other pools should = not have many (< 10) tasks in the pending queue and should be able to = clearing the pending queue.  

Hope that = helps. 
 
http://www.thelastpickle.com

On 18 May 2011, at 19:50, Tomer B wrote:

As for static disk usage i would add = this:

test: df -kh
comments: Compactions can require up to 100% of in use space = temporarily in worst case (data file dir) when approaching 50% or more = of disk capacity use raid0 for data dir disk if cannot try increasing = your disk if cannot consider adding nodes (or first consider adding = nodes if that's what you wish).

2011/5/12 Watanabe Maki <watanabe.maki@gmail.com>
It's interesting topic for me too.
How about to add measurement on static disk utilization (% used) and = memory utilization ( rss, JVM heap, JVM GC )?

maki

=46rom iPhone


On 2011/05/12, at 0:49, Tomer B <tomerbd1@gmail.com> wrote:
 
> Hi
>
> I'm trying to predict when my cluster would soon be needing new = nodes
> added, i want a continuous graph telling my of my cluster health = so
> that when i see my cluster becomes more and more busy (I want = numbers
> & measurments) i would be able to know i need to start = purchasing more
> machines and get them into my cluster, so i want to know of = that
> beforehand.
> I'm writing here what I came with after doing some research over = net.
> I would highly appreciate any additional gauge measurements and = ranges
> in order to test my cluster health and to know beforehand when = i'm
> going to soon need more nodes.Although i'm writing down green
> gauge,yellow gauge,red gauge, i'm also trying to find a = continuous
> graph where i can tell where our cluster stand (as much as
> possible...)
>
> Also my recommendation is always before adding new nodes:
>
> 1. Make sure all nodes are balanced and if not balance them.
> 2. Separate commit log drive from data (SSTables) drive
> 3. use mmap index only in memory and not auto
> 4. Increase disk IO if possible.
> 5. Avoid swapping as much as possible.
>
>
> As for my gauge tests for when to add new nodes:
>
> test: nodetool tpstats -h <cassandra_host>
> green gauge: No pending column with number higher
> yellow gauge: pending columns 100-2000
> red gauge:Larger than 3000
>
> test: iostat -x -n -p -z 5 10  and iostat -xcn 5
> green gauge: kw/s + kr/s reaches is below 25% capacity of disk = io
> yellow gauge: 20%-50%
> red gauge: 50%+
>
> test: ostat -x -n -p -z 5 10 and check %b column
> green gauge: less than 10%
> yellow gauge:  10%-80%
> red gauge: 90%+
>
> test: nodetool cfstats --host localhost
> green gauge: =93SSTable count=94 item does not continually grow = over time
> yellow gauge:
> red gauge: =93SSTable count=94 item continually grows over time
>
> test: ./nodetool cfstats --host localhost | grep -i pending
> green gauge: 0-2
> yellow gauge: 3-100
> red gauge: 101+
>
> I would highly appreciate any additional gauge measurements and = ranges
> in order to test my cluster health and to know ***beforehand*** = when
> i'm going to soon need more nodes.


= --Apple-Mail-7--1014578216--