Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 137729B7C for ; Tue, 6 Mar 2012 11:27:11 +0000 (UTC) Received: (qmail 96363 invoked by uid 500); 6 Mar 2012 11:27:08 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 96325 invoked by uid 500); 6 Mar 2012 11:27:08 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 96317 invoked by uid 99); 6 Mar 2012 11:27:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Mar 2012 11:27:08 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of flefilla@gmail.com designates 209.85.212.172 as permitted sender) Received: from [209.85.212.172] (HELO mail-wi0-f172.google.com) (209.85.212.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Mar 2012 11:27:04 +0000 Received: by wicr5 with SMTP id r5so2536686wic.31 for ; Tue, 06 Mar 2012 03:26:42 -0800 (PST) Received-SPF: pass (google.com: domain of flefilla@gmail.com designates 10.216.133.9 as permitted sender) client-ip=10.216.133.9; Authentication-Results: mr.google.com; spf=pass (google.com: domain of flefilla@gmail.com designates 10.216.133.9 as permitted sender) smtp.mail=flefilla@gmail.com; dkim=pass header.i=flefilla@gmail.com Received: from mr.google.com ([10.216.133.9]) by 10.216.133.9 with SMTP id p9mr8160056wei.9.1331033202579 (num_hops = 1); Tue, 06 Mar 2012 03:26:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=3+p8J7NW9O270k78ktsXpqi7rkjrZm9DCrzjfvZrHhY=; b=JumC+d+gQA9t3XvHUvvk/bHtosEB2hpalov52EHW2k1Xm2vepc1oRVdj5O7ZCUHGps V2nhEyNLh4AHpYgFAt6R77Xy085W7XqEdsq67FlTgfH7b0NP5oNQZOeX449vdNklQOLI TctQgWQgu/Ysme7JLKvSmj1pzQ7nzHlYJIe/ZryYzKdru9JctlCnP3I4fsv18DjOFbx3 5O4VI4uEMBkc10BTxhz6lIrb7aKHcVQGOrI/fdM0lLP26q4NCE6cIWAo4l8jypgA+WW0 PmV+lh3T+leCK/vyTWJQIjLibGxZo6nVOTjvdioFWTwF+Zl7mF7UdbXoaGIn611ZiJ3+ 406A== MIME-Version: 1.0 Received: by 10.216.133.9 with SMTP id p9mr6522548wei.9.1331033202437; Tue, 06 Mar 2012 03:26:42 -0800 (PST) Received: by 10.227.38.5 with HTTP; Tue, 6 Mar 2012 03:26:42 -0800 (PST) In-Reply-To: References: <76B06293-E79F-44A6-8490-788207BE26C8@gmail.com> <2C3D6FA9-3FC3-42B7-81D4-040EEB795E6C@gmail.com> <5A48F325-00B0-4783-9EE1-5DA3F7A146FB@thelastpickle.com> Date: Tue, 6 Mar 2012 12:26:42 +0100 Message-ID: Subject: Re: newer Cassandra + Hadoop = TimedOutException() From: =?ISO-8859-1?Q?Florent_Lefill=E2tre?= To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e6dede32372ff304ba91519f X-Virus-Checked: Checked by ClamAV on apache.org --0016e6dede32372ff304ba91519f Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I remember a bug on the ColumnFamilyInputFormat class 0.8.10. It was a test rpc_endpoints =3D=3D "0.0.0.0" in place of rpc_endpoint.equals("0.0.0.0"), may be it can help you Le 6 mars 2012 12:18, Florent Lefill=E2tre a =E9crit : > Excuse me, I had not understood. > So, for me, the problem comes from the change of ColumnFamilyInputFormat > class between 0.8.7 and 0.8.10 where the splits are created (0.8.7 uses > endpoints and 0.8.10 uses rpc_endpoints). > With your config, splits fails, so Hadoop doesn't run a Map task on > approximtively 16384 rows (your cassandra.input.split.size) but on all th= e > rows of a node (certainly more over 16384). > However Hadoop estimate the task progress on 16384 inputs, it's why you > have something like 9076.81%. > > If you can't change rpc_adress configuration, I don't know how you can > solve your problem :/, sorry. > > Le 6 mars 2012 11:53, Patrik Modesto a =E9crit= : > > Hi Florent, >> >> I don't change the server version, it is the Cassandra 0.8.10. I >> change just the version of cassandra-all in pom.xml of the mapreduce >> job. >> >> I have the 'rpc_address: 0.0.0.0' in cassandra.yaml, because I want >> cassandra to bind RPC to all interfaces. >> >> Regards, >> P. >> >> On Tue, Mar 6, 2012 at 09:44, Florent Lefill=E2tre >> wrote: >> > Hi, I had the same problem on hadoop 0.20.2 and cassandra 1.0.5. >> > In my case the split of token range failed. >> > I have comment line 'rpc_address: 0.0.0.0' in cassandra.yaml. >> > May be see if you have not configuration changes between 0.8.7 and >> 0.8.10 >> > >> > >> > Le 6 mars 2012 09:32, Patrik Modesto a >> =E9crit : >> > >> >> Hi, >> >> >> >> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the >> >> Timeouts I get are not because of the Cassandra can't handle the >> >> requests. I've noticed there are several tasks that show proggess of >> >> several thousands percents. Seems like they are looping their range o= f >> >> keys. I've run the job with debug enabled and the ranges look ok, see >> >> http://pastebin.com/stVsFzLM >> >> >> >> Another difference between cassandra-all 0.8.7 and 0.8.10 is the >> >> number of mappers the job creates: >> >> 0.8.7: 4680 >> >> 0.8.10: 595 >> >> >> >> Task Complete >> >> task_201202281457_2027_m_000041 9076.81% >> >> task_201202281457_2027_m_000073 9639.04% >> >> task_201202281457_2027_m_000105 10538.60% >> >> task_201202281457_2027_m_000108 9364.17% >> >> >> >> None of this happens with cassandra-all 0.8.7. >> >> >> >> Regards, >> >> P. >> >> >> >> >> >> >> >> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto < >> patrik.modesto@gmail.com> >> >> wrote: >> >> > I'll alter these settings and will let you know. >> >> > >> >> > Regards, >> >> > P. >> >> > >> >> > On Tue, Feb 28, 2012 at 09:23, aaron morton > > >> >> > wrote: >> >> >> Have you tried lowering the batch size and increasing the time ou= t? >> >> >> Even >> >> >> just to get it to work. >> >> >> >> >> >> If you get a TimedOutException it means CL number of servers did n= ot >> >> >> respond >> >> >> in time. >> >> >> >> >> >> Cheers >> >> >> >> >> >> ----------------- >> >> >> Aaron Morton >> >> >> Freelance Developer >> >> >> @aaronmorton >> >> >> http://www.thelastpickle.com >> >> >> >> >> >> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote: >> >> >> >> >> >> Hi aaron, >> >> >> >> >> >> this is our current settings: >> >> >> >> >> >> >> >> >> cassandra.range.batch.size >> >> >> 1024 >> >> >> >> >> >> >> >> >> >> >> >> cassandra.input.split.size >> >> >> 16384 >> >> >> >> >> >> >> >> >> rpc_timeout_in_ms: 30000 >> >> >> >> >> >> Regards, >> >> >> P. >> >> >> >> >> >> On Mon, Feb 27, 2012 at 21:54, aaron morton < >> aaron@thelastpickle.com> >> >> >> wrote: >> >> >> >> >> >> What settings do you have for cassandra.range.batch.size >> >> >> >> >> >> and rpc_timeout_in_ms ? Have you tried reducing the first and/or >> >> >> increasing >> >> >> >> >> >> the second ? >> >> >> >> >> >> >> >> >> Cheers >> >> >> >> >> >> >> >> >> ----------------- >> >> >> >> >> >> Aaron Morton >> >> >> >> >> >> Freelance Developer >> >> >> >> >> >> @aaronmorton >> >> >> >> >> >> http://www.thelastpickle.com >> >> >> >> >> >> >> >> >> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote: >> >> >> >> >> >> >> >> >> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo < >> edlinuxguru@gmail.com> >> >> >> >> >> >> wrote: >> >> >> >> >> >> >> >> >> Did you see the notes here? >> >> >> >> >> >> >> >> >> >> >> >> I'm not sure what do you mean by the notes? >> >> >> >> >> >> >> >> >> I'm using the mapred.* settings suggested there: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> mapred.max.tracker.failures >> >> >> >> >> >> 20 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> mapred.map.max.attempts >> >> >> >> >> >> 20 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> mapred.reduce.max.attempts >> >> >> >> >> >> 20 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> But I still see the timeouts that I haven't with cassandra-all >> 0.8.7. >> >> >> >> >> >> >> >> >> P. >> >> >> >> >> >> >> >> >> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting >> >> >> >> >> >> >> >> >> >> >> >> >> > >> > >> > > --0016e6dede32372ff304ba91519f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I remember a bug on the ColumnFamilyInputFormat class 0.8.10.
It was a t= est rpc_endpoints =3D=3D "0.0.0.0" in place of rpc_endpoint.equal= s("0.0.0.0"), may be it can help you

Le 6 mars 2012 12:18, Florent Lefill=E2tre <flefilla@gmail.com> a =E9crit :
Excuse me, I had not understood.
So, for me, the problem comes from the = change of ColumnFamilyInputFormat class between 0.8.7 and 0.8.10 where the = splits are created (0.8.7 uses endpoints and 0.8.10 uses rpc_endpoints). With your config, splits fails, so Hadoop doesn't run a Map task on app= roximtively 16384 rows (your cassandra.input.split.size) but on all the row= s of a node (certainly more over 16384).
However Hadoop estimate the tas= k progress on 16384 inputs, it's why you have something like 9076.81%.<= br>
If you can't change rpc_adress configuration, I don't know how = you can solve your problem :/, sorry.

Le = 6 mars 2012 11:53, Patrik Modesto <patrik.modesto@gmail.com>= a =E9crit :

Hi Florent,

I don't change the server version, it is the Cassandra 0.8.10. I
change just the version of cassandra-all in pom.xml of the mapreduce
job.

I have the 'rpc_address: 0.0.0.0' =A0in cassandra.yaml, because I w= ant
cassandra to bind RPC to all interfaces.

Regards,
P.

On Tue, Mar 6, 2012 at 09:44, Florent Lefill=E2tre <flefilla@gmail.com> wrote:
> Hi, I had the same problem on hadoop 0.20.2 and cassandra 1.0.5.
> In my case the split of token range failed.
> I have comment line 'rpc_address: 0.0.0.0' in cassandra.yaml.<= br> > May be see if you have not configuration changes between 0.8.7 and 0.8= .10
>
>
> Le 6 mars 2012 09:32, Patrik Modesto <patrik.modesto@gmail.com> a =E9crit= :
>
>> Hi,
>>
>> I was recently trying Hadoop job + cassandra-all 0.8.10 again and = the
>> Timeouts I get are not because of the Cassandra can't handle t= he
>> requests. I've noticed there are several tasks that show progg= ess of
>> several thousands percents. Seems like they are looping their rang= e of
>> keys. I've run the job with debug enabled and the ranges look = ok, see
>> http://= pastebin.com/stVsFzLM
>>
>> Another difference between cassandra-all 0.8.7 and 0.8.10 is the >> number of mappers the job creates:
>> 0.8.7: 4680
>> 0.8.10: 595
>>
>> Task =A0 =A0 =A0 Complete
>> task_201202281457_2027_m_000041 9076.81%
>> task_201202281457_2027_m_000073 9639.04%
>> task_201202281457_2027_m_000105 10538.60%
>> task_201202281457_2027_m_000108 9364.17%
>>
>> None of this happens with cassandra-all 0.8.7.
>>
>> Regards,
>> P.
>>
>>
>>
>> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto <patrik.modesto@gmail.com&= gt;
>> wrote:
>> > I'll alter these settings and will let you know.
>> >
>> > Regards,
>> > P.
>> >
>> > On Tue, Feb 28, 2012 at 09:23, aaron morton <aaron@thelastpickle.com= >
>> > wrote:
>> >> Have you tried lowering the =A0batch size and increasing = the time out?
>> >> Even
>> >> just to get it to work.
>> >>
>> >> If you get a TimedOutException it means CL number of serv= ers did not
>> >> respond
>> >> in time.
>> >>
>> >> Cheers
>> >>
>> >> -----------------
>> >> Aaron Morton
>> >> Freelance Developer
>> >> @aaronmorton
>> >> http://www.thelastpickle.com
>> >>
>> >> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
>> >>
>> >> Hi aaron,
>> >>
>> >> this is our current settings:
>> >>
>> >> =A0=A0=A0=A0=A0<property>
>> >> =A0=A0=A0=A0=A0=A0=A0=A0=A0<name>cassandra.range.ba= tch.size</name>
>> >> =A0=A0=A0=A0=A0=A0=A0=A0=A0<value>1024</value>= ;
>> >> =A0=A0=A0=A0=A0</property>
>> >>
>> >> =A0=A0=A0=A0=A0<property>
>> >> =A0=A0=A0=A0=A0=A0=A0=A0=A0<name>cassandra.input.sp= lit.size</name>
>> >> =A0=A0=A0=A0=A0=A0=A0=A0=A0<value>16384</value&g= t;
>> >> =A0=A0=A0=A0=A0</property>
>> >>
>> >> rpc_timeout_in_ms: 30000
>> >>
>> >> Regards,
>> >> P.
>> >>
>> >> On Mon, Feb 27, 2012 at 21:54, aaron morton <aaron@thelastpickle.c= om>
>> >> wrote:
>> >>
>> >> What settings do you have for=A0cassandra.range.batch.siz= e
>> >>
>> >> and=A0rpc_timeout_in_ms =A0? Have you tried reducing the = first and/or
>> >> increasing
>> >>
>> >> the second ?
>> >>
>> >>
>> >> Cheers
>> >>
>> >>
>> >> -----------------
>> >>
>> >> Aaron Morton
>> >>
>> >> Freelance Developer
>> >>
>> >> @aaronmorton
>> >>
>> >> http://www.thelastpickle.com
>> >>
>> >>
>> >> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
>> >>
>> >>
>> >> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo <edlinuxguru@gmail.com<= /a>>
>> >>
>> >> wrote:
>> >>
>> >>
>> >> Did you see the notes here?
>> >>
>> >>
>> >>
>> >> I'm not sure what do you mean by the notes?
>> >>
>> >>
>> >> I'm using the mapred.* settings suggested there:
>> >>
>> >>
>> >> =A0=A0=A0=A0<property>
>> >>
>> >> =A0=A0=A0=A0=A0=A0=A0=A0<name>mapred.max.tracker.fa= ilures</name>
>> >>
>> >> =A0=A0=A0=A0=A0=A0=A0=A0<value>20</value>
>> >>
>> >> =A0=A0=A0=A0</property>
>> >>
>> >> =A0=A0=A0=A0<property>
>> >>
>> >> =A0=A0=A0=A0=A0=A0=A0=A0<name>mapred.map.max.attemp= ts</name>
>> >>
>> >> =A0=A0=A0=A0=A0=A0=A0=A0<value>20</value>
>> >>
>> >> =A0=A0=A0=A0</property>
>> >>
>> >> =A0=A0=A0=A0<property>
>> >>
>> >> =A0=A0=A0=A0=A0=A0=A0=A0<name>mapred.reduce.max.att= empts</name>
>> >>
>> >> =A0=A0=A0=A0=A0=A0=A0=A0<value>20</value>
>> >>
>> >> =A0=A0=A0=A0</property>
>> >>
>> >>
>> >> But I still see the timeouts that I haven't with cass= andra-all 0.8.7.
>> >>
>> >>
>> >> P.
>> >>
>> >>
>> >>
http://wiki.apache.org/cassandra/Hadoop= Support#Troubleshooting
>> >>
>> >>
>> >>
>> >>
>
>


--0016e6dede32372ff304ba91519f--