cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "HadoopSupport" by jeremyhanna
Date Fri, 04 Mar 2011 00:04:23 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "HadoopSupport" page has been changed by jeremyhanna.
The comment on this change is: Adding some more troubleshooting info in a separate section..
http://wiki.apache.org/cassandra/HadoopSupport?action=diff&rev1=26&rev2=27

--------------------------------------------------

   * [[#Pig|Pig]]
   * [[#Hive|Hive]]
   * [[#ClusterConfig|Cluster Configuration]]
+  * [[#Troubleshooting|Troubleshooting]]
   * [[#Support|Support]]
  
  <<Anchor(Overview)>>
@@ -37, +38 @@

  
  ==== Hadoop Streaming ====
  As of 0.7, there is support for [[http://hadoop.apache.org/common/docs/r0.20.0/streaming.html|Hadoop
Streaming]].  For examples on how to use Streaming with Cassandra, see the contrib section
of the Cassandra source.  The relevant tickets are [[https://issues.apache.org/jira/browse/CASSANDRA-1368|CASSANDRA-1368]]
and [[https://issues.apache.org/jira/browse/CASSANDRA-1497|CASSANDRA-1497]].
- 
- ==== Some troubleshooting ====
- Releases before  0.6.2/0.7 are affected by a small  resource leak that may cause jobs to
fail (connections are not released  properly, causing a resource leak). Depending on your
local setup you  may hit this issue, and workaround it by raising the limit of open file 
descriptors for the process (e.g. in linux/bash using `ulimit -n 32000`).  The error will
be reported on  the hadoop job side as a thrift !TimedOutException.
- 
- If you are testing the integration against a single node and you obtain  some failures,
this may be normal: you are probably overloading the  single machine, which may again result
in timeout errors. You can  workaround it by reducing the number of concurrent tasks
- 
- {{{
-              Configuration conf = job.getConfiguration();
-              conf.setInt("mapred.tasktracker.map.tasks.maximum",1);
- }}}
- Also, you may reduce the size in rows of the batch you  are reading from cassandra
- 
- {{{
-              ConfigHelper.setRangeBatchSize(job.getConfiguration(), 1000);
- }}}
  [[#Top|Top]]
  
  <<Anchor(Pig)>>
@@ -93, +79 @@

  
  [[#Top|Top]]
  
+ <<Anchor(Troubleshooting)>>
+ 
+ == Troubleshooting ==
+ If you are running into timeout exceptions, you might need to tweak one or both of these
settings:
+  * '''cassandra.range.batch.size''' - the default is 4096, but you may need to lower this
depending on your data.  This is either specified in your hadoop configuration or using `org.apache.cassandra.hadoop.ConfigHelper.setRangeBatchSize`.
+  * '''rpc_timeout_in_ms''' - this is set in your `cassandra.yaml` (in 0.6 it's `RpcTimeoutInMillis`
in `storage-conf.xml`).  The rpc timeout is not for timing out from the client but between
nodes.  This can be increased to reduce chances of timing out.
+ 
+ Releases before 0.6.2/0.7 are affected by a small resource leak that may cause jobs to fail
(connections are not released  properly, causing a resource leak). Depending on your local
setup you may hit this issue, and workaround it by raising the limit of open file descriptors
for the process (e.g. in linux/bash using `ulimit -n 32000`).  The error will be reported
on the hadoop job side as a thrift !TimedOutException.
+ 
+ If you are testing the integration against a single node and you obtain some failures, this
may be normal: you are probably overloading the single machine, which may again result in
timeout errors. You can workaround it by reducing the number of concurrent tasks
+ 
+ {{{
+              Configuration conf = job.getConfiguration();
+              conf.setInt("mapred.tasktracker.map.tasks.maximum",1);
+ }}}
+ Also, you may reduce the size in rows of the batch you  are reading from cassandra
+ 
+ {{{
+              ConfigHelper.setRangeBatchSize(job.getConfiguration(), 1000);
+ }}}
+ 
+ [[#Top|Top]]
+ 
  <<Anchor(Support)>>
  
  == Support ==

Mime
View raw message