incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin McGovern <martin.mcgov...@gmail.com>
Subject CASSANDRA-2388 - ColumnFamilyRecordReader fails for a given split because a host is down
Date Fri, 03 Feb 2012 14:05:58 GMT
Hi,

Is the following scenario covered by 2388? I have a test cluster of 6 nodes
with a replication factor of 3. Each server can execute hadoop tasks. 1
cassandra node is down for the test.

The job is kicked off from node 1 jobtracker.
A task is executed from node 1, and fails because the local cassandra
instance is down
retry on node 6, this tries to connect to node 1 and fails
retry on node 5, this tries to connect to node 1 and fails
retry on node 4, this tries to connect to node 1 and fails
After 4 failures the task is killed and the job fails.

Node 2 and 3 which contain the other replicas never run the task. The node
selection seems to be random. I can modify the cassandra code to check
connectivity in ColumnFamilyRecordReader but I suspect this is fixing the
wrong problem.

Is there a reason that Hadoop cannot select the appropriate node? Is it a
configuration problem?
I've read
http://mail-archives.apache.org/mod_mbox/cassandra-user/201108.mbox/%3CCALdd-zhMWx5VKfn2EJx8pwOdp-0PNwqMrvHmeeT=5tHt+uXxSw@mail.gmail.com%3Ewhich
seem to imply that the scenario will fail, but this comment from mck
seems to say it should work
http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3C1315253057.7466.222.camel@localhost%3E

Thanks,
Martin

Mime
View raw message