Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 30755 invoked from network); 4 Mar 2011 00:04:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Mar 2011 00:04:47 -0000 Received: (qmail 60115 invoked by uid 500); 4 Mar 2011 00:04:47 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 60092 invoked by uid 500); 4 Mar 2011 00:04:47 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 60084 invoked by uid 500); 4 Mar 2011 00:04:47 -0000 Delivered-To: apmail-incubator-cassandra-commits@incubator.apache.org Received: (qmail 60081 invoked by uid 99); 4 Mar 2011 00:04:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Mar 2011 00:04:47 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.131] (HELO eos.apache.org) (140.211.11.131) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Mar 2011 00:04:44 +0000 Received: from eosnew.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 785D5129; Fri, 4 Mar 2011 00:04:23 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Fri, 04 Mar 2011 00:04:23 -0000 Message-ID: <20110304000423.84160.28394@eosnew.apache.org> Subject: =?utf-8?q?=5BCassandra_Wiki=5D_Update_of_=22HadoopSupport=22_by_jeremyhan?= =?utf-8?q?na?= X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for= change notification. The "HadoopSupport" page has been changed by jeremyhanna. The comment on this change is: Adding some more troubleshooting info in a s= eparate section.. http://wiki.apache.org/cassandra/HadoopSupport?action=3Ddiff&rev1=3D26&rev2= =3D27 -------------------------------------------------- * [[#Pig|Pig]] * [[#Hive|Hive]] * [[#ClusterConfig|Cluster Configuration]] + * [[#Troubleshooting|Troubleshooting]] * [[#Support|Support]] = <> @@ -37, +38 @@ = =3D=3D=3D=3D Hadoop Streaming =3D=3D=3D=3D As of 0.7, there is support for [[http://hadoop.apache.org/common/docs/r0= .20.0/streaming.html|Hadoop Streaming]]. For examples on how to use Stream= ing with Cassandra, see the contrib section of the Cassandra source. The r= elevant tickets are [[https://issues.apache.org/jira/browse/CASSANDRA-1368|= CASSANDRA-1368]] and [[https://issues.apache.org/jira/browse/CASSANDRA-1497= |CASSANDRA-1497]]. - = - =3D=3D=3D=3D Some troubleshooting =3D=3D=3D=3D - Releases before 0.6.2/0.7 are affected by a small resource leak that ma= y cause jobs to fail (connections are not released properly, causing a res= ource leak). Depending on your local setup you may hit this issue, and wor= karound it by raising the limit of open file descriptors for the process (= e.g. in linux/bash using `ulimit -n 32000`). The error will be reported on= the hadoop job side as a thrift !TimedOutException. - = - If you are testing the integration against a single node and you obtain = some failures, this may be normal: you are probably overloading the single= machine, which may again result in timeout errors. You can workaround it = by reducing the number of concurrent tasks - = - {{{ - Configuration conf =3D job.getConfiguration(); - conf.setInt("mapred.tasktracker.map.tasks.maximum",1); - }}} - Also, you may reduce the size in rows of the batch you are reading from = cassandra - = - {{{ - ConfigHelper.setRangeBatchSize(job.getConfiguration(), 1000); - }}} [[#Top|Top]] = <> @@ -93, +79 @@ = [[#Top|Top]] = + <> + = + =3D=3D Troubleshooting =3D=3D + If you are running into timeout exceptions, you might need to tweak one o= r both of these settings: + * '''cassandra.range.batch.size''' - the default is 4096, but you may ne= ed to lower this depending on your data. This is either specified in your = hadoop configuration or using `org.apache.cassandra.hadoop.ConfigHelper.set= RangeBatchSize`. + * '''rpc_timeout_in_ms''' - this is set in your `cassandra.yaml` (in 0.6= it's `RpcTimeoutInMillis` in `storage-conf.xml`). The rpc timeout is not = for timing out from the client but between nodes. This can be increased to= reduce chances of timing out. + = + Releases before 0.6.2/0.7 are affected by a small resource leak that may = cause jobs to fail (connections are not released properly, causing a resou= rce leak). Depending on your local setup you may hit this issue, and workar= ound it by raising the limit of open file descriptors for the process (e.g.= in linux/bash using `ulimit -n 32000`). The error will be reported on the= hadoop job side as a thrift !TimedOutException. + = + If you are testing the integration against a single node and you obtain s= ome failures, this may be normal: you are probably overloading the single m= achine, which may again result in timeout errors. You can workaround it by = reducing the number of concurrent tasks + = + {{{ + Configuration conf =3D job.getConfiguration(); + conf.setInt("mapred.tasktracker.map.tasks.maximum",1); + }}} + Also, you may reduce the size in rows of the batch you are reading from = cassandra + = + {{{ + ConfigHelper.setRangeBatchSize(job.getConfiguration(), 1000); + }}} + = + [[#Top|Top]] + = <> = =3D=3D Support =3D=3D