cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrik Modesto <>
Subject Re: RF=1
Date Wed, 17 Aug 2011 12:01:57 GMT
And one more patch:
This one handles a case where there are no nodes available for a
slice. For example where the is a keyspace with RF=1 and a node is
shut down. Its range of keys gets ignored.


On Wed, Aug 17, 2011 at 13:28, Patrik Modesto <> wrote:
> Hi,
> while I was investigating this issue, I've found that hadoop+cassandra
> don't work if you stop even just one node in the cluster. It doesn't
> depend on RF. ColumnFamilyRecordReader gets list of nodes (acording
> the RF) but chooses just the local host and if there is no cassandra
> running localy it throws RuntimeError exception. Which in turn marks
> the MapReduce task as failed.
> I've created a patch that makes ColumnFamilyRecordReader to try the
> local node and if it fails tries the other nodes in it's list. The
> patch is here I think attachements are
> not allowed on this ML.
> Please test it and apply. It's for 0.7.8 version.
> Regards,
> P.
> On Wed, Aug 3, 2011 at 13:59, aaron morton <> wrote:
>> If you want to take a look o.a.c.hadoop.ColumnFamilyRecordReader.getSplits() is the
function that gets the splits.
>> Cheers
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> On 3 Aug 2011, at 16:18, Patrik Modesto wrote:
>>> On Tue, Aug 2, 2011 at 23:10, Jeremiah Jordan
>>> <> wrote:
>>>> If you have RF=1, taking one node down is going to cause 25% of your
>>>> data to be unavailable.  If you want to tolerate a machines going down
>>>> you need to have at least RF=2, if you want to use quorum and have a
>>>> machine go down, you need at least RF=3.
>>> I know I can have RF > 1 but I have limited resources and I don't care
>>> lossing 25% of the data. RF > 1 basicaly means if a node goes down I
>>> have the data elsewhere, but what I need is if node goes down just
>>> ignore its range. I can handle it in my applications using thrift, but
>>> the hadoop-mapreduce can't handle it. It just fails with "Exception in
>>> thread "main" Could not get input splits". Is
>>> there a way to say ignore this range to hadoop?
>>> Regards,
>>> P.

View raw message