Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C5C25F971 for ; Fri, 10 May 2013 01:09:34 +0000 (UTC) Received: (qmail 26842 invoked by uid 500); 10 May 2013 01:09:32 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 26795 invoked by uid 500); 10 May 2013 01:09:32 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 26786 invoked by uid 99); 10 May 2013 01:09:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 May 2013 01:09:31 +0000 X-ASF-Spam-Status: No, hits=1.0 required=5.0 tests=SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of jrf@mit.edu does not designate 69.164.218.4 as permitted sender) Received: from [69.164.218.4] (HELO computableinsights.com) (69.164.218.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 May 2013 01:09:26 +0000 Received: by computableinsights.com (Postfix, from userid 1000) id 32FDC306C67; Thu, 9 May 2013 21:09:06 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by computableinsights.com (Postfix) with ESMTP id 1539C31CF7D for ; Thu, 9 May 2013 21:09:06 -0400 (EDT) Date: Thu, 9 May 2013 21:09:06 -0400 (EDT) From: "John R. Frank" X-X-Sender: jrf@computableinsights.com To: user@cassandra.apache.org Subject: pycassa failures in large batch cycling Message-ID: User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Virus-Checked: Checked by ClamAV on apache.org C* users, We have a process that loads a large batch of rows from Cassandra into many separate compute workers. The rows are one-column wide and range in size for a couple KB to ~100 MB. After manipulating the data for a while, each compute worker writes the data back with *new* row keys computed by the workers (UUIDs). After the full batch is written back to new rows, a cleanup worker deletes the old rows. After several cycles, pycassa starts getting connection failures. Should we use a pycassa listener to catch these failures and just recreate the ConnectionPool and keep going as if the connection had not dropped? Or is there a better approach? These failures happen on just a simple single-node setup with a total data set less than half the size of Java heap space, e.g. 2GB data (times two for the two copies during cycling) versus 8GB heap. We tried reducing memtable_flush_queue_size to 2 so that it would flush the deletes faster, and also tried multithreaded_compaction=true, but still pycassa gets connection failures. Is this expected before for shedding load? Or is this unexpected? Would things be any different if we used multiple nodes and scaled the data and worker count to match? I mean, is there something inherent to cassandra's operating model that makes it want to always have multiple nodes? Thanks for pointers, John