Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 11597C7D3 for ; Mon, 24 Jun 2013 16:10:49 +0000 (UTC) Received: (qmail 99075 invoked by uid 500); 24 Jun 2013 16:10:46 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 98910 invoked by uid 500); 24 Jun 2013 16:10:42 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 98896 invoked by uid 99); 24 Jun 2013 16:10:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Jun 2013 16:10:41 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of brian.jeltema@digitalenvoy.net designates 68.64.43.136 as permitted sender) Received: from [68.64.43.136] (HELO barracuda.digitalenvoy.net) (68.64.43.136) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 24 Jun 2013 16:10:35 +0000 X-ASG-Debug-ID: 1372090213-05f61154fb065b0001-f7dORa Received: from brian-jeltema.employees.digitalenvoy.net (norc-office.digitalenvoy.net [64.129.218.66]) by barracuda.digitalenvoy.net with ESMTP id ADbqczbjB1flt6Hg (version=TLSv1 cipher=AES128-SHA bits=128 verify=NO) for ; Mon, 24 Jun 2013 12:10:14 -0400 (EDT) X-Barracuda-Envelope-From: brian.jeltema@digitalenvoy.net X-Barracuda-Apparent-Source-IP: 64.129.218.66 X-ASG-Whitelist: Client From: Brian Jeltema Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: Hadoop/Cassandra 1.2 timeouts Date: Mon, 24 Jun 2013 12:10:13 -0400 X-ASG-Orig-Subj: Hadoop/Cassandra 1.2 timeouts Message-Id: To: user@cassandra.apache.org Mime-Version: 1.0 (Apple Message framework v1278) X-Mailer: Apple Mail (2.1278) X-Barracuda-Connect: norc-office.digitalenvoy.net[64.129.218.66] X-Barracuda-Start-Time: 1372090214 X-Barracuda-Encrypted: AES128-SHA X-Barracuda-URL: http://barracuda.digitalenvoy.net:8000/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at digitalenvoy.net X-Barracuda-BRTS-Status: 1 X-Virus-Checked: Checked by ClamAV on apache.org I'm having problems with Hadoop job failures on a Cassandra 1.2 cluster = due to=20 Caused by: TimedOutException() 2013-06-24 11:29:11,953 INFO Driver - at = org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassand= ra.java:12932) This is running on a 6-node cluster, RF=3D3. If I run the job with = CL=3DONE, it usually runs pretty well, with an occasional timeout. But if I run at CL=3DQUORUM, the number of timeouts is often enough to kill = the job. The table being read is effectively read-only when this job = runs. It has from 5 to 10 million rows, with each row having no more than 256 = columns. Each column typically only has a few hundred bytes of data at = most. I've fiddled with the batch-range size and increasing the timeout, = without a lot of luck. I see some evidence of GC activity in the = Cassandra logs, but it's hard to see a clear correlation with the timeouts. I could use some suggestions on an approach to pin down the root cause. TIA Brian=