Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DDABD9F0D for ; Tue, 27 Nov 2012 17:53:13 +0000 (UTC) Received: (qmail 69303 invoked by uid 500); 27 Nov 2012 17:53:11 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 69207 invoked by uid 500); 27 Nov 2012 17:53:11 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 69199 invoked by uid 500); 27 Nov 2012 17:53:11 -0000 Delivered-To: apmail-incubator-cassandra-user@incubator.apache.org Received: (qmail 69196 invoked by uid 99); 27 Nov 2012 17:53:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Nov 2012 17:53:11 +0000 X-ASF-Spam-Status: No, hits=2.0 required=5.0 tests=SPF_NEUTRAL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 216.139.250.139 is neither permitted nor denied by domain of solf.lists@gmail.com) Received: from [216.139.250.139] (HELO joe.nabble.com) (216.139.250.139) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Nov 2012 17:53:04 +0000 Received: from jim.nabble.com ([192.168.236.80]) by joe.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1TdPKt-0002Jr-Ad for cassandra-user@incubator.apache.org; Tue, 27 Nov 2012 09:52:43 -0800 Date: Tue, 27 Nov 2012 09:52:42 -0800 (PST) From: Sergey Olefir To: cassandra-user@incubator.apache.org Message-ID: <1354038762579-7583996.post@n2.nabble.com> In-Reply-To: References: <1354034910271-7583993.post@n2.nabble.com> Subject: Re: counters + replication = awful performance? MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi Juan, thanks for your input! In my case, however, I doubt this is the case -- clients are able to push many more updates than I need to saturate replication_factor=2 case (e.g. I'm doing as many as 6x more increments when testing 2-node cluster with replication_factor=1), so bandwidth between clients and server should be sufficient. Bandwidth between nodes in the cluster should also be quite sufficient since they are both in the same DC. But it is something to check, thanks! Best regards, Sergey Juan Valencia wrote > Hi Sergey, > > I know I've had similar issues with counters which were bottle-necked by > network throughput. You might be seeing a problem with throughput between > the clients and Cass or between the two Cass nodes. It might not be your > case, but that was what happened to me :-) > > Juan > > > On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir < > solf.lists@ > > wrote: > >> Hi, >> >> I have a serious problem with counters performance and I can't seem to >> figure it out. >> >> Basically I'm building a system for accumulating some statistics "on the >> fly" via Cassandra distributed counters. For this I need counter updates >> to >> work "really fast" and herein lies my problem -- as soon as I enable >> replication_factor = 2, the performance goes down the drain. This happens >> in >> my tests using both 1.0.x and 1.1.6. >> >> Let me elaborate: >> >> I have two boxes (virtual servers on top of physical servers rented >> specifically for this purpose, i.e. it's not a cloud, nor it is shared; >> virtual servers are managed by our admins as a way to limit damage as I >> suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner >> because >> I want to be able to do some range queries. >> >> First, I set up Cassandra individually on each box (not in a cluster) and >> test counter increments performance (exclusively increments, no reads). >> For >> tests I use code that is intended to somewhat resemble the expected load >> pattern -- particularly the majority of increments create new counters >> with >> some updating (adding) to already existing counters. In this test each >> single node exhibits respectable performance - something on the order of >> 70k >> (seventy thousand) increments per second. >> >> I then join both of these nodes into single cluster (using SimpleSnitch >> and >> SimpleStrategy, nothing fancy yet). I then run the same test using >> replication_factor=1. The performance is on the order of 120k increments >> per >> second -- which seems to be a reasonable increase over the single node >> performance. >> >> >> HOWEVER I then rerun the same test on the two-node cluster using >> replication_factor=2 -- which is the least I'll need for actual >> production >> for redundancy purposes. And the performance I get is absolutely horrible >> -- >> much, MUCH worse than even single-node performance -- something on the >> order >> of less than 25k increments per second. In addition to clients not being >> able to push updates fast enough, I also see a lot of 'messages dropped' >> messages in the Cassandra log under this load. >> >> Could anyone advise what could be causing such drastic performance drop >> under replication_factor=2? I was expecting something on the order of >> single-node performance, not approximately 3x less. >> >> >> When testing replication_factor=2 on 1.1.6 I can see that CPU usage goes >> through the roof. On 1.0.x I think it looked more like disk overload, but >> I'm not sure (being on virtual server I apparently can't see true >> iostats). >> >> I do have Cassandra data on a separate disk, commit log and cache are >> currently on the same disk as the system. I experimented with commit log >> flush modes and even with disabling commit log at all -- but it doesn't >> seem >> to have noticeable impact on the performance when under >> replication_factor=2. >> >> >> Any suggestions and hints will be much appreciated :) And please let me >> know >> if I need to share additional information about the configuration I'm >> running on. >> >> Best regards, >> Sergey >> >> >> >> -- >> View this message in context: >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993.html >> Sent from the > cassandra-user@.apache > mailing list archive at >> Nabble.com. >> > > > > -- > > Learn More: SQI (Social Quality Index) - A Universal Measure of Social > Quality -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993p7583996.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.