Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D6FBEDD64 for ; Tue, 13 Nov 2012 00:24:35 +0000 (UTC) Received: (qmail 93572 invoked by uid 500); 13 Nov 2012 00:24:33 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 93549 invoked by uid 500); 13 Nov 2012 00:24:33 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 93541 invoked by uid 99); 13 Nov 2012 00:24:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Nov 2012 00:24:33 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [66.111.4.29] (HELO out5-smtp.messagingengine.com) (66.111.4.29) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Nov 2012 00:24:28 +0000 Received: from compute6.internal (compute6.nyi.mail.srv.osa [10.202.2.46]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id E654820A34 for ; Mon, 12 Nov 2012 19:24:02 -0500 (EST) Received: from web3.nyi.mail.srv.osa ([10.202.2.213]) by compute6.internal (MEProxy); Mon, 12 Nov 2012 19:24:02 -0500 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=message-id:from:to:mime-version :content-transfer-encoding:content-type:subject:date:in-reply-to :references; s=smtpout; bh=tE08OrKLc4Q4C2Phx7t3LIu2Cx8=; b=WWBQo tUaBl7L6JMrqjlJDE86ddx6JWSDZ80juznLeV0WoQpvWhsnVtwBTMIkkVAzUQb3L AL9OmNnff5NtAeYQ57YmWaig2brMYke9hx8p3guL+7g1dVg4PSlCJ8OSYxFpqafr 7PmJqSktyvo/81CSMVBb935OHUs3d+p1trQeAo= Received: by web3.nyi.mail.srv.osa (Postfix, from userid 99) id B00B83A0E8A; Mon, 12 Nov 2012 19:24:02 -0500 (EST) Message-Id: <1352766242.1659.140661152841949.3A3327EB@webmail.messagingengine.com> X-Sasl-Enc: QXEcgijR92qaA54F8iRh1XaZELGnrNq5vQ/FL/iuhoPM 1352766242 From: Kirk True To: user@cassandra.apache.org MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: multipart/alternative; boundary="_----------=_135276624216590"; charset="utf-8" X-Mailer: MessagingEngine.com Webmail Interface Subject: Re: read request distribution Date: Mon, 12 Nov 2012 16:24:02 -0800 In-Reply-To: References: <1352439442.7645.YahooMailNeo@web160901.mail.bf1.yahoo.com> X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --_----------=_135276624216590 Content-Transfer-Encoding: 7bit Content-Type: text/plain Somewhat recently the Ownership column was changed to Effective Ownership. Previously the formula was essentially 100/. Now it's 100*/. So in previous releases of Cassandra it would be 100/12 = 8.33, now it would be closer to 25% (8.33*3 (assuming a replication factor of three)). Kirk On Mon, Nov 12, 2012, at 03:52 PM, Ananth Gundabattula wrote: Hi all, On an unrelated observation of the below readings, it looks like all the 3 nodes own 100% of the data. This confuses me a bit. We have a 12 node cluster with RF=3 but the effective ownership is shown as 8.33 % . So here is my question. How is the ownership calculated : Is Replica factor considered in the ownership calculation ? ( If yes , then 8.33 % ownership of a cluster seems wrong to me . If not 100% ownership for a node cluster seems wrong to me. Am I missing something in the calculation? Regards, Ananth On Fri, Nov 9, 2012 at 4:37 PM, Wei Zhu <[1]wz1975@yahoo.com> wrote: Hi All, I am doing a benchmark on a Cassandra. I have a three node cluster with RF=3. I generated 6M rows with sequence number from 1 to 6m, so the rows should be evenly distributed among the three nodes disregarding the replicates. I am doing a benchmark with read only requests, I generate read request for randomly generated keys from 1 to 6M. Oddly, nodetool cfstats, reports that one node has only half the requests as the other one and the third node sits in the middle. So the ratio is like 2:3:4. The node with the most read requests actually has the smallest latency and the one with the least read requests reports the largest latency. The difference is pretty big, the fastest is almost double the slowest. All three nodes have the exactly the same hardware and the data size on each node are the same since the RF is three and all of them have the complete data. I am using Hector as client and the random read request are in millions. I can't think of a reasonable explanation. Can someone please shed some lights? Thanks. -Wei References 1. mailto:wz1975@yahoo.com --_----------=_135276624216590 Content-Transfer-Encoding: 7bit Content-Type: text/html
Somewhat recently the Ownership column was changed to Effective Ownership. 
 
Previously the formula was essentially 100/<nodes>. Now it's 100*<replication factor>/<nodes>. So in previous releases of Cassandra it would be 100/12 = 8.33, now it would be closer to 25% (8.33*3 (assuming a replication factor of three)).
 
Kirk
 
On Mon, Nov 12, 2012, at 03:52 PM, Ananth Gundabattula wrote:
Hi all,
 
On an unrelated observation of the below readings, it looks like all the 3 nodes own 100% of the data. This confuses me a bit. We have a 12 node cluster with RF=3 but the effective ownership is shown as 8.33 % . 
 
So here is my question. How is the ownership calculated : Is Replica factor considered in the ownership calculation ? ( If yes , then 8.33 % ownership of a cluster seems wrong to me . If not 100% ownership for a node cluster seems wrong to me. Am I missing something in the calculation? 
 
Regards,
Ananth
 
On Fri, Nov 9, 2012 at 4:37 PM, Wei Zhu <wz1975@yahoo.com> wrote:
Hi All,
I am doing a benchmark on a Cassandra. I have a three node cluster with RF=3. I generated 6M rows with sequence  number from 1 to 6m, so the rows should be evenly distributed among the three nodes disregarding the replicates.
I am doing a benchmark with read only requests, I generate read request for randomly generated keys from 1 to 6M. Oddly, nodetool cfstats, reports that one node has only half the requests as the other one and the third node sits in the middle. So the ratio is like 2:3:4. The node with the most read requests actually has the smallest latency and the one with the least read requests reports the largest latency. The difference is pretty big, the fastest is almost double the slowest.
All three nodes have the exactly the same hardware and the data size on each node are the same since the RF is three and all of them have the complete data. I am using Hector as client and the random read request are in millions. I can't think of a reasonable explanation.  Can someone please shed some lights?
 
Thanks.
-Wei
--_----------=_135276624216590--