cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mck <...@apache.org>
Subject Re: RF=1 w/ hadoop jobs
Date Thu, 01 Sep 2011 10:36:41 GMT
On Thu, 2011-08-18 at 08:54 +0200, Patrik Modesto wrote:
> But there is the another problem with Hadoop-Cassandra, if there is no
> node available for a range of keys, it fails on RuntimeError. For
> example having a keyspace with RF=1 and a node is down all MapReduce
> tasks fail. 

CASSANDRA-2388 is related but not the same.

Before 0.8.4 the behaviour was if the local cassandra node didn't have
the split's data the tasktracker would connect to another cassandra node
where the split's data could be found.

So even <0.8.4 with RF=1 you would have your hadoop job fail.

Although I've reopened CASSANDRA-2388 (and reverted the code locally)
because the new behaviour in 0.8.4 leads to abysmal tasktracker
throughput (for me task allocation doesn't seem to honour data-locality
according to split.getLocations()).

> I've reworked my previous patch, that was addressing this
> issue and now there are ConfigHelper methods for enable/disable
> ignoring unavailable ranges.
> It's available here: http://pastebin.com/hhrr8m9P (for version 0.7.8) 

I'm interested in this patch and see it's usefulness but no one will act
until you attach it to an issue. (I think a new issue is appropriate
here).

~mck


Mime
View raw message