I am a newbie to Cassandra. Was trying out a sample (word count) code on BulkOutputFormat and got stuck with an error.


What I am trying to do is – migrate all Hive tables (from Hadoop cluster) to Cassandra column families.

My MR program is configured to run on Hadoop cluster v 0.20.2 (cdh3u3) by pointing job config params ‘fs.default.name’ and ‘mapred.job.tracker’ appropriately.

The output is pointed to my local Cassandra v1.1.7.

Have set the following params for writing to Cassandra:

conf.set("cassandra.output.keyspace", "Customer");

       conf.set("cassandra.output.columnfamily", "words");

       conf.set("cassandra.output.partitioner.class", "org.apache.cassandra.dht.RandomPartitioner");

       conf.set("cassandra.output.thrift.port","9160");    // default

       conf.set("cassandra.output.thrift.address", "localhost");

       conf.set("mapreduce.output.bulkoutputformat.streamthrottlembits", "10");


But, programs fails with the below error:

12/12/13 15:32:55 INFO security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.

Cassandra thrift address   :      localhost

Cassandra thrift port      :      9160

12/12/13 15:32:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

12/12/13 15:34:21 INFO input.FileInputFormat: Total input paths to process : 1

12/12/13 15:34:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

12/12/13 15:34:21 WARN snappy.LoadSnappy: Snappy native library not loaded

12/12/13 15:34:22 INFO mapred.JobClient: Running job: job_201212111101_4622

12/12/13 15:34:23 INFO mapred.JobClient:  map 0% reduce 0%

12/12/13 15:34:28 INFO mapred.JobClient:  map 100% reduce 0%

12/12/13 15:34:37 INFO mapred.JobClient:  map 100% reduce 33%

12/12/13 15:34:39 INFO mapred.JobClient: Task Id : attempt_201212111101_4622_r_000000_0, Status : FAILED

java.lang.RuntimeException: Could not retrieve endpoint ranges:

       at org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:328)

       at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:116)

       at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:111)

       at org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:223)

       at org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:208)

       at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:573)

       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)

       at org.apache.hadoop.mapred.Child$4.run(Child.java:270)

       at java.security.AccessController.doPrivileged(Native Method)

       at javax.security.auth.Subject.doAs(Subject.java:396)

       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)

       at org.apache.hadoop.mapred.Child.main(Child.java:264)

Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectE


Please help me out understand the problem.



Anand B

