cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mick Semb Wever <>
Subject Re: CASSANDRA-2388 - ColumnFamilyRecordReader fails for a given split because a host is down
Date Fri, 16 Mar 2012 12:55:09 GMT
Sorry for such a late reply. I'm not always keeping up with the mailing

> Is the following scenario covered by 2388? I have a test cluster of 6
> nodes with a replication factor of 3. Each server can execute hadoop
> tasks. 1 cassandra node is down for the test.
> The job is kicked off from node 1 jobtracker.
> A task is executed from node 1, and fails because the local cassandra
> instance is down
> retry on node 6, this tries to connect to node 1 and fails
> retry on node 5, this tries to connect to node 1 and fails
> retry on node 4, this tries to connect to node 1 and fails
> After 4 failures the task is killed and the job fails.
> Node 2 and 3 which contain the other replicas never run the task. The
> node selection seems to be random. I can modify the cassandra code to
> check connectivity in ColumnFamilyRecordReader but I suspect this is
> fixing the wrong problem.

There are two problems here.

1) hadoop's jobtracker isn't preferencing tasks to tasktracker that
would provide data locality.

2) connection replica nodes are never attempted directly, instead the
task must fail and be re-submitted to another tasktracker which
hopefully is a replica node.

> [snip] but this comment from mck seems to say it should work
> 3C1315253057.7466.222.camel@localhost%3E

not in your case. 
ColumnFamilyInputFormat splits the query into InputSplits. This is done
via the api calls describe_ring and describe_splits. These InputSplits
(ColumnFamilySplit) each has a list of locations which are the replica

Now hadoop is supposed to preference sending tasks to tasktrackers based
on the split's location. This is problem (1). I haven't seen it actually
work. The closest information i got is

Problem (2) is ColumnFamilyRecordReader.getLocation() returns you the
address from the list of locations for the current split that matches
the localhost. This preferences data locality. If none of the locations
is local then it simply returns the first location in the list. This
explains your use case not working. One fix for you to experiment with
is to increase the allowed task failures (i think it is
mapred.max.tracker.failures) to the number of nodes you have. Then each
node would be (randomly) tried before the task killed and job failed.


"Friendship with the upright, with the truthful and with the well
informed is beneficial. Friendship with those who flatter, with those
who are meek and who compromise with principles, and with those who talk
cleverly is harmful." Confucius 

| | |

View raw message