From user-return-31632-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Mon Feb 4 17:05:19 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A7588EEA5 for ; Mon, 4 Feb 2013 17:05:19 +0000 (UTC) Received: (qmail 70459 invoked by uid 500); 4 Feb 2013 17:05:15 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 69030 invoked by uid 500); 4 Feb 2013 17:05:13 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 68178 invoked by uid 99); 4 Feb 2013 17:05:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Feb 2013 17:05:12 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a93.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Feb 2013 17:05:07 +0000 Received: from homiemail-a93.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a93.g.dreamhost.com (Postfix) with ESMTP id 669AC8405C for ; Mon, 4 Feb 2013 09:04:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=15O4t9npon1ePjLeYCOnYZTzlQ c=; b=qfgVtZmmRerQRPlTLrSmSZUSVpUbH/b5VZG6UyrKnyCZVEhtWz89rN9kxG YfRZtzHvEaaxv6G2xvjFJZG1cvw8VCpGLIaAjLSq0BJT9tz7ojPblPxpsprkMn1V 6EL8z+4OySgeuNB5SRtyB+12vvdua28Zw5bFsj5JK6aGyVLvI= Received: from [172.16.1.8] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a93.g.dreamhost.com (Postfix) with ESMTPSA id B31F28405B for ; Mon, 4 Feb 2013 09:04:45 -0800 (PST) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_A58AF23D-1C3F-400F-94B2-6CE972F3747E" Message-Id: <8F5CAB39-A749-45C1-B8D0-578297875C60@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: BloomFilter Date: Tue, 5 Feb 2013 06:04:44 +1300 References: <57C7C3CBDCB04F45A57AEC4CB21C0CCD1DB1E80B@mbx024-e1-nj-6.exch024.domain.local> To: user@cassandra.apache.org In-Reply-To: <57C7C3CBDCB04F45A57AEC4CB21C0CCD1DB1E80B@mbx024-e1-nj-6.exch024.domain.local> X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_A58AF23D-1C3F-400F-94B2-6CE972F3747E Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > 1) What is the ratio of the sstable file size to bloom filter size ? = If i have a sstable of 1 GB, what is the approximate bloom filter size ? = Assuming > 0.000744 default val configured. The size of the bloom filter varies with the number of rows in the CF, = not the on disk size. More correctly it's the number of rows in each = SSTable as a row can be stored in multiple sstables.=20 nodetool cfstats reports the total bloom filter size for each cf.=20 > 2) The bloom filters are stored in RAM but not in help from 1.2 = onwards ? They are always in RAM. Pre 1.2 they were stored in the JVM heap, from = 1.2 onwards they are stored off heap.=20 > 3) What is the ratio of the RAM/Disk per node ? What is the max disk = size recommended for 1 node ? If I have 10 TB of data per node, how much = RAM will the bloomfilter consume ? If you are using a spinning disk (HDD) and have 1GB networking, I would = consider 300GB to 500GB a good rule of thumb for a small <6 node = cluster. There issues have to do with the time it takes to run nodetool repair, = and the time it takes to replace a failed node. Once you have a feel for = how long this takes you may want to put more data on each node. In 1.2 there are things that make replacing a node faster, but they tend = to kick in at higher node counts. Cheers =20 ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 3/02/2013, at 6:45 AM, Kanwar Sangha wrote: > Hi - Couple of questions - > =20 > 1) What is the ratio of the sstable file size to bloom filter size ? = If i have a sstable of 1 GB, what is the approximate bloom filter size ? = Assuming > 0.000744 default val configured. > =20 > 2) The bloom filters are stored in RAM but not in help from 1.2 = onwards ? > =20 > 3) What is the ratio of the RAM/Disk per node ? What is the max disk = size recommended for 1 node ? If I have 10 TB of data per node, how much = RAM will the bloomfilter consume ? > =20 > Thanks, > kanwar > =20 --Apple-Mail=_A58AF23D-1C3F-400F-94B2-6CE972F3747E Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
1) What is the ratio of the sstable file size to bloom = filter size ? If i have a sstable of 1 GB, what is the approximate bloom = filter size ? Assuming
0.000744 default val = configured.
The size of the = bloom filter varies with the number of rows in the CF, not the on disk = size. More correctly it's the number of rows in each SSTable as a row = can be stored in multiple sstables. 
nodetool cfstats reports the total = bloom filter size for each cf. 
2) The bloom = filters are stored in RAM but not in help from 1.2 onwards = ?
They are = always in RAM. Pre 1.2 they were stored in the JVM heap, from 1.2 = onwards they are stored off = heap. 
3) What is the = ratio of the RAM/Disk per node ?  What is the max disk size = recommended for 1 node ? If I have 10 TB of data per node, how much RAM = will the bloomfilter consume = ?
If you = are using a spinning disk (HDD) and have 1GB networking, I would = consider 300GB to 500GB a good rule of thumb for a small <6 node = cluster.

There issues have = to do with the time it takes to run nodetool repair, and the time it = takes to replace a failed node. Once you have a feel for how long this = takes you may want to put more data on each node.

In 1.2 there are = things that make replacing a node faster, but they tend to kick in at = higher node counts.

Cheers
  
http://www.thelastpickle.com

On 3/02/2013, at 6:45 AM, Kanwar Sangha <kanwar@mavenir.com> = wrote:

Hi - Couple of questions = -
 
1) What is the ratio of the sstable file size to bloom = filter size ? If i have a sstable of 1 GB, what is the approximate bloom = filter size ? Assuming
0.000744 default val = configured.
 
2) The bloom filters are stored in RAM but not in help = from 1.2 onwards ?
 
3) What is the ratio of the RAM/Disk per node ?  = What is the max disk size recommended for 1 node ? If I have 10 TB of = data per node, how much RAM will the bloomfilter consume = ?
 
Thanks,
kanwar
 

= --Apple-Mail=_A58AF23D-1C3F-400F-94B2-6CE972F3747E--