Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 91554 invoked from network); 22 Aug 2010 21:16:35 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 Aug 2010 21:16:35 -0000 Received: (qmail 12311 invoked by uid 500); 22 Aug 2010 21:16:34 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 12261 invoked by uid 500); 22 Aug 2010 21:16:33 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 12253 invoked by uid 99); 22 Aug 2010 21:16:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 22 Aug 2010 21:16:33 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.82.44] (HELO mail-ww0-f44.google.com) (74.125.82.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 22 Aug 2010 21:16:29 +0000 Received: by wwd20 with SMTP id 20so278336wwd.25 for ; Sun, 22 Aug 2010 14:16:07 -0700 (PDT) Received: by 10.216.0.206 with SMTP id 56mr3750752web.33.1282511767399; Sun, 22 Aug 2010 14:16:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.3.129 with HTTP; Sun, 22 Aug 2010 14:15:47 -0700 (PDT) In-Reply-To: References: <1282341400.15256.139.camel@dehora-laptop> From: Benjamin Black Date: Sun, 22 Aug 2010 14:15:47 -0700 Message-ID: Subject: Re: Node OOM Problems To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Sun, Aug 22, 2010 at 2:03 PM, Wayne wrote: > From a testing whether cassandra can take the load long term I do not see= it > as different. Yes bulk loading can be made faster using very different Then you need far more IO, whether it comes form faster drives or more nodes. If you can achieve 10k writes/sec/node and linear scaling without sharding in MySQL on cheap, commodity hardware then I am impressed. > methods, but my purpose is to test cassandra with a large volume of write= s > (and not to bulk load as efficiently as possible). I have scaled back to = 5 > writer threads per node and still see 8k writes/sec/node. With the larger > memory table settings we shall see how it goes. I have no idea how to cha= nge > a JMX setting and prefer to use std options to be frank. For us this is If you want best performance, you must tune the system appropriately. If you want to use the base settings (which are intended for the 1G max heap which is way too small for anything interesting), expect suboptimal performance for your application. > after all an evaluation of whether Cassandra can replace Mysql. > > I thank everyone for their help. > > On Sun, Aug 22, 2010 at 10:37 PM, Benjamin Black wrote: >> >> Wayne, >> >> Bulk loading this much data is a very different prospect from needing >> to sustain that rate of updates indefinitely. =A0As was suggested >> earlier, you likely need to tune things differently, including >> disabling minor compactions during the bulk load, to make this work >> efficiently. >> >> >> b >> >> On Sun, Aug 22, 2010 at 12:40 PM, Wayne wrote: >> > Has anyone loaded 2+ terabytes of real data in one stretch into a >> > cluster >> > without bulk loading and without any problems? How long did it take? >> > What >> > kind of nodes were used? How many writes/sec/node can be sustained for >> > 24+ >> > hours? >> > >> > >> > >> > On Sun, Aug 22, 2010 at 8:22 PM, Peter Schuller >> > wrote: >> >> >> >> I only sifted recent history of this thread (for time reasons), but: >> >> >> >> > You have started a major compaction which is now competing with tho= se >> >> > near constant minor compactions for far too little I/O (3 SATA driv= es >> >> > in RAID0, perhaps?). =A0Normally, this would result in a massive >> >> > ballooning of your heap use as all sorts of activities (like memtab= le >> >> > flushes) backed up, as well. >> >> >> >> AFAIK memtable flushing is unrelated to compaction in the sense that >> >> they occur concurrently and don't block each other (except to the >> >> extent that they truly do compete for e.g. disk or CPU resources). >> >> >> >> While small memtables do indeed mean more compaction activity in >> >> total, the expensiveness of any given compaction should not be >> >> severely affecting. >> >> >> >> As far as I can tell, the two primary effects of small memtable sizes >> >> are: >> >> >> >> * An increase in total amount of compaction work done in total for a >> >> given database size. >> >> * An increase in the number of sstables that may accumulate while >> >> larger compactions are running. >> >> ** That in turn is particularly relevant because it can generate a lo= t >> >> of seek-bound activity; consider for example range queries that end u= p >> >> spanning 10 000 files on disk. >> >> >> >> If memtable flushes are not able to complete fast enough to cope with >> >> write activity, even if that is the case only during concurrenct >> >> compaction (for whatever reason), that suggests to me that write >> >> activity is too high. Increasing memtable sizes may help on average >> >> due to decreased compaction work, but I don't see why it would >> >> significantly affect the performance one compactions *do* in fact run= . >> >> >> >> With respect to timeouts on writes: I make no claims as to whether it >> >> is expected, because I have not yet investigated, but I definitely se= e >> >> sporadic slowness when benchmarking high-throughput writes on a >> >> cassandra trunk snapshot somewhere between 0.6 and 0.7. This occurs >> >> even when writing to a machine where the commit log and data >> >> directories are both on separate RAID volumes that are battery backed >> >> and should have no trouble eating write bursts (and the data is such >> >> that one is CPU bound =A0rather than diskbound on average; so it only >> >> needs to eat bursts). >> >> >> >> I've had to add re-try to the benchmarking tool (or else up the >> >> timeout) because the default was not enough. >> >> >> >> I have not investigated exactly why this happens but it's an >> >> interesting effect that as far as I can tell should not be there. >> >> Haver other people done high-throughput writes (to the point of CPU >> >> saturation) over extended periods of time while consistently seeing >> >> low latencies (consistencty meaning never exceeding hundreds of ms >> >> over several days)? >> >> >> >> >> >> -- >> >> / Peter Schuller >> > >> > > >