Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D1D9618FDA for ; Mon, 13 Jul 2015 22:33:54 +0000 (UTC) Received: (qmail 13323 invoked by uid 500); 13 Jul 2015 22:33:51 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 13289 invoked by uid 500); 13 Jul 2015 22:33:51 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 13279 invoked by uid 99); 13 Jul 2015 22:33:51 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jul 2015 22:33:51 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 2D60BD4486 for ; Mon, 13 Jul 2015 22:33:51 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.429 X-Spam-Level: X-Spam-Status: No, score=-1.429 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-1.428, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id TxRxLbZHHSUU for ; Mon, 13 Jul 2015 22:33:50 +0000 (UTC) Received: from herb.net0.kurokatta.org (herb.net0.kurokatta.org [37.59.38.12]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with SMTP id 8B51F2092B for ; Mon, 13 Jul 2015 22:33:49 +0000 (UTC) Received: (qmail 5024 invoked by uid 500); 13 Jul 2015 22:31:48 -0000 Date: Mon, 13 Jul 2015 18:31:48 -0400 From: David Haguenauer To: user@cassandra.apache.org Subject: Bulk loading performance Message-ID: <20150713223147.GV21479@snafu.kurokatta.org> Mail-Followup-To: user@cassandra.apache.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Web: http://www.kurokatta.org/ X-Face: "U|aI!fP/Fv%x4ZG?9}!@G9,zz2[9bhDBrYE&nIEL8N't?f0Q8~X?F@(:VV $WpE`T5bPj,/cC]/p&<{]8qLfir]9b^: ?y4iE17oEY47F&z6E3dUXq}eHy&]yGq?A User-Agent: Mutt/1.5.21 (2010-09-15) Hi, I have a use case wherein I receive a daily batch of data; it's about 50M--100M records (a record is a list of integers, keyed by a UUID). The target is a 12-node cluster. Using a simple-minded approach (24 batched inserts in parallel, using the Ruby client), while the cluster is being read at a rate of about 150k/s, I get about 15.5k insertions per second. This in itself is satisfactory, but the concern is that the large amount of writes causes the read latency to jump up during the insertion, and for a while after. I tried using sstableloader instead, and the overall throughput is similar (I spend 2/3 of the time preparing the SSTables, and 1/3 actually pushing them to nodes), but I believe this still causes a hike in read latency (after the load is complete). Is there a set of best practices for this kind of workload? We would like to avoid interfering with reads as much as possible. I can of course post more information about our setup and requirements if this helps answering. -- Thanks, David Haguenauer