Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AA65E1087F for ; Tue, 29 Oct 2013 23:10:02 +0000 (UTC) Received: (qmail 63478 invoked by uid 500); 29 Oct 2013 23:10:00 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 63457 invoked by uid 500); 29 Oct 2013 23:10:00 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 63449 invoked by uid 99); 29 Oct 2013 23:09:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Oct 2013 23:09:59 +0000 X-ASF-Spam-Status: No, hits=1.0 required=5.0 tests=SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of lolitushka@gmail.com does not designate 132.72.126.41 as permitted sender) Received: from [132.72.126.41] (HELO smtp3.bgu.ac.il) (132.72.126.41) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Oct 2013 23:09:53 +0000 Received: from smtp3.bgu.ac.il (unknown [127.0.0.1]) by IMSVA80 (Postfix) with ESMTP id D51D93E0B76 for ; Wed, 30 Oct 2013 01:09:30 +0200 (IST) Received: from [212.235.89.52] (unknown [212.235.89.52]) by smtp3.bgu.ac.il (Postfix) with ESMTP id B54263E06ED for ; Wed, 30 Oct 2013 01:09:30 +0200 (IST) Message-ID: <5270402A.70105@gmail.com> Date: Wed, 30 Oct 2013 01:09:30 +0200 From: Piavlo User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: not even number of keys per CFs in fully balanced cluster with random partitioner Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-Product-Ver: IMSVA-8.0.0.1450-7.0.0.1014-20256.002 X-TM-AS-Result: No--4.718-9.0-31-10 X-imss-scan-details: No--4.718-9.0-31-10 X-TM-AS-User-Approved-Sender: No X-TMASE-MatchedRID: AvuQOGDihJpGPGvQwPCn2+kCJxGMLs8mmGSSol4Uei25IifwYL1+q8ZZ qIp9IsXulTFYemHPpJu2EqoGfakM7S+Z9AcMonWXylAqNTt8FdUflI34PuyGH41eRh5mhJr6JQq z5peAzKMDtrnxqeLbpEn4ui95bc1w8c599US/0C9IcJTn2HkqsZXxsKTUj1Z+vqq8s2MNhPCb4i DlO9ygjmV2WpiInmXY0C1sQRfQzEHEQdG7H66TyHEqm8QYBtMOqal/SN5LnN6ixWN3vZMHvv/5B e2py1M8CFhuGC6dLc/zDfF5NuwQLNpHJzlVPxLXfDZFPml1XqvDy1zK9YKM3EsjLDoXZFzUiRVp LhuVBFGCesZVvDAhYFS/0OAxcyMgbhnxpAl3W6k= X-Virus-Checked: Checked by ClamAV on apache.org Hi, There is a 12 node cluster , still stuck on 1.0.8. All nodes in the cluster ring are balanced. Using random partitioner. All CFs use compression. Data size on nodes varies from 40G to 75G. This variance is not due to the bigger nodes having more uncompacted sstables than others. Most biggest CFs have exact same row keys, just store different data, so data for same same key should end up on same node for these CFs. The keys estimate for each of these biggest CF on the nodes with larger data size is almost twice larger than key estimate on the nodes with smallest data size, thus proportional to the data size on the node. These CFs have about 50-100 millions for rows per node. I can't understand how statistically it's possible that with random partitioner some nodes have x2 more keys than others with 50-100 millions of keys per node. Any ideas how it's possible? Anything else I can check? tnx Alex