incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrik Modesto <patrik.mode...@gmail.com>
Subject Re: RF=1
Date Wed, 17 Aug 2011 12:01:57 GMT
And one more patch: http://pastebin.com/zfNPjtQz
This one handles a case where there are no nodes available for a
slice. For example where the is a keyspace with RF=1 and a node is
shut down. Its range of keys gets ignored.

Regards,
P.

On Wed, Aug 17, 2011 at 13:28, Patrik Modesto <patrik.modesto@gmail.com> wrote:
> Hi,
>
> while I was investigating this issue, I've found that hadoop+cassandra
> don't work if you stop even just one node in the cluster. It doesn't
> depend on RF. ColumnFamilyRecordReader gets list of nodes (acording
> the RF) but chooses just the local host and if there is no cassandra
> running localy it throws RuntimeError exception. Which in turn marks
> the MapReduce task as failed.
>
> I've created a patch that makes ColumnFamilyRecordReader to try the
> local node and if it fails tries the other nodes in it's list. The
> patch is here http://pastebin.com/0RdQ0HMx I think attachements are
> not allowed on this ML.
>
> Please test it and apply. It's for 0.7.8 version.
>
> Regards,
> P.
>
>
> On Wed, Aug 3, 2011 at 13:59, aaron morton <aaron@thelastpickle.com> wrote:
>> If you want to take a look o.a.c.hadoop.ColumnFamilyRecordReader.getSplits() is the
function that gets the splits.
>>
>>
>> Cheers
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 3 Aug 2011, at 16:18, Patrik Modesto wrote:
>>
>>> On Tue, Aug 2, 2011 at 23:10, Jeremiah Jordan
>>> <jeremiah.jordan@morningstar.com> wrote:
>>>> If you have RF=1, taking one node down is going to cause 25% of your
>>>> data to be unavailable.  If you want to tolerate a machines going down
>>>> you need to have at least RF=2, if you want to use quorum and have a
>>>> machine go down, you need at least RF=3.
>>>
>>> I know I can have RF > 1 but I have limited resources and I don't care
>>> lossing 25% of the data. RF > 1 basicaly means if a node goes down I
>>> have the data elsewhere, but what I need is if node goes down just
>>> ignore its range. I can handle it in my applications using thrift, but
>>> the hadoop-mapreduce can't handle it. It just fails with "Exception in
>>> thread "main" java.io.IOException: Could not get input splits". Is
>>> there a way to say ignore this range to hadoop?
>>>
>>> Regards,
>>> P.
>>
>>
>

Mime
View raw message