Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates
 74.125.82.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAC43XBkyMfKNnnOZG7N5Q_gJzWzmpXXCtVw_2RsVEveEhQeSuQ@mail.gmail.com>
References: 
 <CAC43XBkc0kXnbjTObeX9-u8DeF=V1SM3vSOP2Ex4G34o_70BDA@mail.gmail.com>
 <1312319458.4058.4.camel@us-wash-ch2ljq1.morningstar.com>
 <CAC43XBnLdmh_VjkfSfc2Jdsp-447SOzE9y-pNwnQA0n=z-kFBA@mail.gmail.com>
 <813E81B0-6A8C-4DAD-AB85-A5BA9C734F28@thelastpickle.com>
 <CAC43XBkyMfKNnnOZG7N5Q_gJzWzmpXXCtVw_2RsVEveEhQeSuQ@mail.gmail.com>
From: Jonathan Ellis <jbellis@gmail.com>
Date: Wed, 17 Aug 2011 10:08:46 -0500
Message-ID: 
 <CALdd-zgXOkPHKMnUmiU=8jLVQEScesffxDAUpLQyB-LoLoKNHA@mail.gmail.com>
Subject: Re: RF=1
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

See https://issues.apache.org/jira/browse/CASSANDRA-2388

On Wed, Aug 17, 2011 at 6:28 AM, Patrik Modesto
<patrik.modesto@gmail.com> wrote:
> Hi,
>
> while I was investigating this issue, I've found that hadoop+cassandra
> don't work if you stop even just one node in the cluster. It doesn't
> depend on RF. ColumnFamilyRecordReader gets list of nodes (acording
> the RF) but chooses just the local host and if there is no cassandra
> running localy it throws RuntimeError exception. Which in turn marks
> the MapReduce task as failed.
>
> I've created a patch that makes ColumnFamilyRecordReader to try the
> local node and if it fails tries the other nodes in it's list. The
> patch is here http://pastebin.com/0RdQ0HMx I think attachements are
> not allowed on this ML.
>
> Please test it and apply. It's for 0.7.8 version.
>
> Regards,
> P.
>
>
> On Wed, Aug 3, 2011 at 13:59, aaron morton <aaron@thelastpickle.com> wrot=
e:
>> If you want to take a look o.a.c.hadoop.ColumnFamilyRecordReader.getSpli=
ts() is the function that gets the splits.
>>
>>
>> Cheers
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 3 Aug 2011, at 16:18, Patrik Modesto wrote:
>>
>>> On Tue, Aug 2, 2011 at 23:10, Jeremiah Jordan
>>> <jeremiah.jordan@morningstar.com> wrote:
>>>> If you have RF=3D1, taking one node down is going to cause 25% of your
>>>> data to be unavailable. =A0If you want to tolerate a machines going do=
wn
>>>> you need to have at least RF=3D2, if you want to use quorum and have a
>>>> machine go down, you need at least RF=3D3.
>>>
>>> I know I can have RF > 1 but I have limited resources and I don't care
>>> lossing 25% of the data. RF > 1 basicaly means if a node goes down I
>>> have the data elsewhere, but what I need is if node goes down just
>>> ignore its range. I can handle it in my applications using thrift, but
>>> the hadoop-mapreduce can't handle it. It just fails with "Exception in
>>> thread "main" java.io.IOException: Could not get input splits". Is
>>> there a way to say ignore this range to hadoop?
>>>
>>> Regards,
>>> P.
>>
>>
>


--=20
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com