Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
From: Apache Wiki <wikidiffs@apache.org>
To: Apache Wiki <wikidiffs@apache.org>
Date: Fri, 04 Mar 2011 00:04:23 -0000
Message-ID: <20110304000423.84160.28394@eosnew.apache.org>
Subject: 
 =?utf-8?q?=5BCassandra_Wiki=5D_Update_of_=22HadoopSupport=22_by_jeremyhan?=
 =?utf-8?q?na?=

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for=
 change notification.

The "HadoopSupport" page has been changed by jeremyhanna.
The comment on this change is: Adding some more troubleshooting info in a s=
eparate section..
http://wiki.apache.org/cassandra/HadoopSupport?action=3Ddiff&rev1=3D26&rev2=
=3D27

--------------------------------------------------

   * [[#Pig|Pig]]
   * [[#Hive|Hive]]
   * [[#ClusterConfig|Cluster Configuration]]
+  * [[#Troubleshooting|Troubleshooting]]
   * [[#Support|Support]]
  =

  <<Anchor(Overview)>>
@@ -37, +38 @@

  =

  =3D=3D=3D=3D Hadoop Streaming =3D=3D=3D=3D
  As of 0.7, there is support for [[http://hadoop.apache.org/common/docs/r0=
.20.0/streaming.html|Hadoop Streaming]].  For examples on how to use Stream=
ing with Cassandra, see the contrib section of the Cassandra source.  The r=
elevant tickets are [[https://issues.apache.org/jira/browse/CASSANDRA-1368|=
CASSANDRA-1368]] and [[https://issues.apache.org/jira/browse/CASSANDRA-1497=
|CASSANDRA-1497]].
- =

- =3D=3D=3D=3D Some troubleshooting =3D=3D=3D=3D
- Releases before  0.6.2/0.7 are affected by a small  resource leak that ma=
y cause jobs to fail (connections are not released  properly, causing a res=
ource leak). Depending on your local setup you  may hit this issue, and wor=
karound it by raising the limit of open file  descriptors for the process (=
e.g. in linux/bash using `ulimit -n 32000`).  The error will be reported on=
  the hadoop job side as a thrift !TimedOutException.
- =

- If you are testing the integration against a single node and you obtain  =
some failures, this may be normal: you are probably overloading the  single=
 machine, which may again result in timeout errors. You can  workaround it =
by reducing the number of concurrent tasks
- =

- {{{
-              Configuration conf =3D job.getConfiguration();
-              conf.setInt("mapred.tasktracker.map.tasks.maximum",1);
- }}}
- Also, you may reduce the size in rows of the batch you  are reading from =
cassandra
- =

- {{{
-              ConfigHelper.setRangeBatchSize(job.getConfiguration(), 1000);
- }}}
  [[#Top|Top]]
  =

  <<Anchor(Pig)>>
@@ -93, +79 @@

  =

  [[#Top|Top]]
  =

+ <<Anchor(Troubleshooting)>>
+ =

+ =3D=3D Troubleshooting =3D=3D
+ If you are running into timeout exceptions, you might need to tweak one o=
r both of these settings:
+  * '''cassandra.range.batch.size''' - the default is 4096, but you may ne=
ed to lower this depending on your data.  This is either specified in your =
hadoop configuration or using `org.apache.cassandra.hadoop.ConfigHelper.set=
RangeBatchSize`.
+  * '''rpc_timeout_in_ms''' - this is set in your `cassandra.yaml` (in 0.6=
 it's `RpcTimeoutInMillis` in `storage-conf.xml`).  The rpc timeout is not =
for timing out from the client but between nodes.  This can be increased to=
 reduce chances of timing out.
+ =

+ Releases before 0.6.2/0.7 are affected by a small resource leak that may =
cause jobs to fail (connections are not released  properly, causing a resou=
rce leak). Depending on your local setup you may hit this issue, and workar=
ound it by raising the limit of open file descriptors for the process (e.g.=
 in linux/bash using `ulimit -n 32000`).  The error will be reported on the=
 hadoop job side as a thrift !TimedOutException.
+ =

+ If you are testing the integration against a single node and you obtain s=
ome failures, this may be normal: you are probably overloading the single m=
achine, which may again result in timeout errors. You can workaround it by =
reducing the number of concurrent tasks
+ =

+ {{{
+              Configuration conf =3D job.getConfiguration();
+              conf.setInt("mapred.tasktracker.map.tasks.maximum",1);
+ }}}
+ Also, you may reduce the size in rows of the batch you  are reading from =
cassandra
+ =

+ {{{
+              ConfigHelper.setRangeBatchSize(job.getConfiguration(), 1000);
+ }}}
+ =

+ [[#Top|Top]]
+ =

  <<Anchor(Support)>>
  =

  =3D=3D Support =3D=3D