cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulo Ricardo Motta Gomes <paulo.mo...@chaordicsystems.com>
Subject Cassandra hadoop job fails if any node is DOWN
Date Tue, 13 May 2014 22:51:57 GMT
Hello,

One of the nodes of our Analytics DC is dead, but ColumnFamilyInputFormat
(CFIF) still assigns Hadoop input splits to it. This leads to many failed
tasks and consequently a failed job.

* Tasks fail with: java.lang.RuntimeException:
org.apache.thrift.transport.TTransportException: Failed to open a transport
to XX.75:9160. (obviously, the node is dead)

* Job fails with: Job Failed: # of failed Map Tasks exceeded allowed limit.
FailedCount: 1. LastFailedTask: task_201404180250_4207_m_000079

We use RF=2 and CL=LOCAL_ONE for hadoop jobs, C* 1.2.16. Is this expected
behavior?

I checked CFIF code, but it always assigns input splits to all the ring
nodes, no matter if the node is dead or alive. What we do to fix is patch
CFIF to blacklist the dead node, but this is not very automatic procedure.
Am I not getting something here?

Cheers,

-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br <http://www.chaordic.com.br/>*
+55 48 3232.3200

Mime
View raw message