incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Jeltema <brian.jelt...@digitalenvoy.net>
Subject cassandra/hadoop BulkOutputFormat failures
Date Fri, 14 Sep 2012 18:34:20 GMT
I'm trying to do a bulk load from a Cassandra/Hadoop job using the BulkOutputFormat class.
It appears that the reducers are generating the SSTables, but is failing to load them into
the cluster:

12/09/14 14:08:13 INFO mapred.JobClient: Task Id : attempt_201208201337_0184_r_000004_0, Status
: FAILED
 java.io.IOException: Too many hosts failed: [/10.4.0.6, /10.4.0.5, /10.4.0.2, /10.4.0.1,
/10.4.0.3, /10.4.0.4] 
        at org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:242)
        at org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:207)
        at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:579)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)   
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)  

A brief look at the BulkOutputFormat class shows that it depends on SSTableLoader. My Hadoop
cluster
and my Cassandra cluster are co-located on the same set of machines. I haven't found any stated
restrictions,
but does this technique only work if the Hadoop cluster is distinct from the Cassandra cluster?
Any suggestions
on how to get past this problem?

Thanks in advance.

Brian
Mime
View raw message