incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <jeremy.hanna1...@gmail.com>
Subject Re: cassandra/hadoop BulkOutputFormat failures
Date Sat, 15 Sep 2012 03:34:13 GMT
A couple of guesses:
- are you mixing versions of Cassandra?  Streaming differences between versions might throw
this error.  That is, are you bulk loading with one version of Cassandra into a cluster that's
a different version?
- (shot in the dark) is your cluster overwhelmed for some reason?

If the temp dir hasn't been cleaned up yet, you are able to retry, fwiw.

Jeremy

On Sep 14, 2012, at 1:34 PM, Brian Jeltema <brian.jeltema@digitalenvoy.net> wrote:

> I'm trying to do a bulk load from a Cassandra/Hadoop job using the BulkOutputFormat class.
> It appears that the reducers are generating the SSTables, but is failing to load them
into the cluster:
> 
> 12/09/14 14:08:13 INFO mapred.JobClient: Task Id : attempt_201208201337_0184_r_000004_0,
Status : FAILED
> java.io.IOException: Too many hosts failed: [/10.4.0.6, /10.4.0.5, /10.4.0.2, /10.4.0.1,
/10.4.0.3, /10.4.0.4] 
>        at org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:242)
>        at org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:207)
>        at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:579)
>        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)   
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>        at org.apache.hadoop.mapred.Child.main(Child.java:249)  
> 
> A brief look at the BulkOutputFormat class shows that it depends on SSTableLoader. My
Hadoop cluster
> and my Cassandra cluster are co-located on the same set of machines. I haven't found
any stated restrictions,
> but does this technique only work if the Hadoop cluster is distinct from the Cassandra
cluster? Any suggestions
> on how to get past this problem?
> 
> Thanks in advance.
> 
> Brian


Mime
View raw message