Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 579289819 for ; Wed, 7 Mar 2012 10:02:48 +0000 (UTC) Received: (qmail 3943 invoked by uid 500); 7 Mar 2012 10:02:45 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 3618 invoked by uid 500); 7 Mar 2012 10:02:43 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 3599 invoked by uid 99); 7 Mar 2012 10:02:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Mar 2012 10:02:43 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of flefilla@gmail.com designates 209.85.210.172 as permitted sender) Received: from [209.85.210.172] (HELO mail-iy0-f172.google.com) (209.85.210.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Mar 2012 10:02:38 +0000 Received: by iazz13 with SMTP id z13so9788653iaz.31 for ; Wed, 07 Mar 2012 02:02:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=iRlU8qHvLY/NVpbE0bbmO1U35f2GbAVhTY3uK8iX70k=; b=pgN/Z+Tl4cOkPCX+v8XgNqG4N6NM4fgXtTAhh8EmV7WruP9/pcRjCARhRNf++k7kXO YJ+DfQ6JuO1H+OPWPtayx8pi3vCDbEtmMpWvczWP6sfeINYDv/TGs3Eq494WKUsNvWAe rKsm4yKi6c80cP8s2m6UXZsRq2qM5KIMxPzpaXG37k3EacC55okTSodNyune7wz/VvIa JrINHUObSRNbGMxwPpRVLG6yhk1mkmsKbOyHXQDNLgYWrFu0Xdrlzd0ZMqUUOFoIQu3C /w6jiiH2lJWft/KhkqMXWCLg8W9M+hZXSYPHW99nQJJuzGyldetQBiq5VNqvNbB4J+0t XRJw== MIME-Version: 1.0 Received: by 10.50.242.5 with SMTP id wm5mr1280621igc.40.1331114538008; Wed, 07 Mar 2012 02:02:18 -0800 (PST) Received: by 10.42.148.73 with HTTP; Wed, 7 Mar 2012 02:02:17 -0800 (PST) In-Reply-To: References: <76B06293-E79F-44A6-8490-788207BE26C8@gmail.com> <2C3D6FA9-3FC3-42B7-81D4-040EEB795E6C@gmail.com> <5A48F325-00B0-4783-9EE1-5DA3F7A146FB@thelastpickle.com> Date: Wed, 7 Mar 2012 11:02:17 +0100 Message-ID: Subject: Re: newer Cassandra + Hadoop = TimedOutException() From: =?ISO-8859-1?Q?Florent_Lefill=E2tre?= To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d044784a731821a04baa44151 X-Virus-Checked: Checked by ClamAV on apache.org --f46d044784a731821a04baa44151 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable If you want try a test, in the CFIF.getSubSplits(String, String, TokenRange, Configuration) method, replace the loop on 'range.rpc_endpoints' by the same loop on 'range.endpoints'. This method split token range of each node with describe_splits method, but I think there is something wrong when you create Cassandra connection on host '0.0.0.0'. Le 7 mars 2012 09:07, Patrik Modesto a =E9crit : > You're right, I wasn't looking in the right logs. Unfortunately I'd > need to restart hadoop takstracker with loglevel DEBUG and that is not > possilbe at the moment. Pitty it happens only in the production with > terrabytes of data, not in the test... > > Regards, > P. > > On Tue, Mar 6, 2012 at 14:31, Florent Lefill=E2tre > wrote: > > CFRR.getProgress() is called by child mapper tasks on each TastTracker > node, > > so the log must appear on > > ${hadoop_log_dir}/attempt_201202081707_0001_m_000000_0/syslog (or > somethings > > like this) on TaskTrackers, not on client job logs. > > Are you sure to see the good log file, I say that because in your first > mail > > you link the client job log. > > And may be you can log the size of each split in CFIF. > > > > > > > > > > Le 6 mars 2012 13:09, Patrik Modesto a =E9cr= it > : > > > >> I've added a debug message in the CFRR.getProgress() and I can't find > >> it in the debug output. Seems like the getProgress() has not been > >> called at all; > >> > >> Regards, > >> P. > >> > >> On Tue, Mar 6, 2012 at 09:49, Jeremy Hanna > >> wrote: > >> > you may be running into this - > >> > https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure > if it > >> > really affects the execution of the job itself though. > >> > > >> > On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote: > >> > > >> >> Hi, > >> >> > >> >> I was recently trying Hadoop job + cassandra-all 0.8.10 again and t= he > >> >> Timeouts I get are not because of the Cassandra can't handle the > >> >> requests. I've noticed there are several tasks that show proggess o= f > >> >> several thousands percents. Seems like they are looping their range > of > >> >> keys. I've run the job with debug enabled and the ranges look ok, s= ee > >> >> http://pastebin.com/stVsFzLM > >> >> > >> >> Another difference between cassandra-all 0.8.7 and 0.8.10 is the > >> >> number of mappers the job creates: > >> >> 0.8.7: 4680 > >> >> 0.8.10: 595 > >> >> > >> >> Task Complete > >> >> task_201202281457_2027_m_000041 9076.81% > >> >> task_201202281457_2027_m_000073 9639.04% > >> >> task_201202281457_2027_m_000105 10538.60% > >> >> task_201202281457_2027_m_000108 9364.17% > >> >> > >> >> None of this happens with cassandra-all 0.8.7. > >> >> > >> >> Regards, > >> >> P. > >> >> > >> >> > >> >> > >> >> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto > >> >> wrote: > >> >>> I'll alter these settings and will let you know. > >> >>> > >> >>> Regards, > >> >>> P. > >> >>> > >> >>> On Tue, Feb 28, 2012 at 09:23, aaron morton < > aaron@thelastpickle.com> > >> >>> wrote: > >> >>>> Have you tried lowering the batch size and increasing the time > out? > >> >>>> Even > >> >>>> just to get it to work. > >> >>>> > >> >>>> If you get a TimedOutException it means CL number of servers did > not > >> >>>> respond > >> >>>> in time. > >> >>>> > >> >>>> Cheers > >> >>>> > >> >>>> ----------------- > >> >>>> Aaron Morton > >> >>>> Freelance Developer > >> >>>> @aaronmorton > >> >>>> http://www.thelastpickle.com > >> >>>> > >> >>>> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote: > >> >>>> > >> >>>> Hi aaron, > >> >>>> > >> >>>> this is our current settings: > >> >>>> > >> >>>> > >> >>>> cassandra.range.batch.size > >> >>>> 1024 > >> >>>> > >> >>>> > >> >>>> > >> >>>> cassandra.input.split.size > >> >>>> 16384 > >> >>>> > >> >>>> > >> >>>> rpc_timeout_in_ms: 30000 > >> >>>> > >> >>>> Regards, > >> >>>> P. > >> >>>> > >> >>>> On Mon, Feb 27, 2012 at 21:54, aaron morton < > aaron@thelastpickle.com> > >> >>>> wrote: > >> >>>> > >> >>>> What settings do you have for cassandra.range.batch.size > >> >>>> > >> >>>> and rpc_timeout_in_ms ? Have you tried reducing the first and/or > >> >>>> increasing > >> >>>> > >> >>>> the second ? > >> >>>> > >> >>>> > >> >>>> Cheers > >> >>>> > >> >>>> > >> >>>> ----------------- > >> >>>> > >> >>>> Aaron Morton > >> >>>> > >> >>>> Freelance Developer > >> >>>> > >> >>>> @aaronmorton > >> >>>> > >> >>>> http://www.thelastpickle.com > >> >>>> > >> >>>> > >> >>>> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote: > >> >>>> > >> >>>> > >> >>>> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo > >> >>>> > >> >>>> > >> >>>> wrote: > >> >>>> > >> >>>> > >> >>>> Did you see the notes here? > >> >>>> > >> >>>> > >> >>>> > >> >>>> I'm not sure what do you mean by the notes? > >> >>>> > >> >>>> > >> >>>> I'm using the mapred.* settings suggested there: > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> mapred.max.tracker.failures > >> >>>> > >> >>>> 20 > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> mapred.map.max.attempts > >> >>>> > >> >>>> 20 > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> mapred.reduce.max.attempts > >> >>>> > >> >>>> 20 > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> But I still see the timeouts that I haven't with cassandra-all > 0.8.7. > >> >>>> > >> >>>> > >> >>>> P. > >> >>>> > >> >>>> > >> >>>> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> > > > > > > --f46d044784a731821a04baa44151 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable If you want try a test, in the CFIF.getSubSplits(String, String, TokenRange= , Configuration) method, replace the loop on 'range.rpc_endpoints' = by the same loop on 'range.endpoints'.
This method split token r= ange of each node with describe_splits method, but I think there is somethi= ng wrong when you create Cassandra connection on host '0.0.0.0'.



Le 7 mars 2012 09:07, Patrik Mod= esto <patr= ik.modesto@gmail.com> a =E9crit :
You're right, I wasn't looking in the right logs. Unfortunately I&#= 39;d
need to restart hadoop takstracker with loglevel DEBUG and that is not
possilbe at the moment. Pitty it happens only in the production with
terrabytes of data, not in the test...

Regards,
P.

On Tue, Mar 6, 2012 at 14:31, Florent Lefill=E2tre <flefilla@gmail.com> wrote:
> CFRR.getProgress() is called by child mapper tasks on each TastTracker= node,
> so the log must appear on
> ${hadoop_log_dir}/attempt_201202081707_0001_m_000000_0/syslog (or some= things
> like this) on TaskTrackers, not on client job logs.
> Are you sure to see the good log file, I say that because in your firs= t mail
> you link the client job log.
> And may be you can log the size of each split in CFIF.
>
>
>
>
> Le 6 mars 2012 13:09, Patrik Modesto <patrik.modesto@gmail.com> a =E9crit :
>
>> I've added a debug message in the CFRR.getProgress() and I can= 't find
>> it in the debug output. Seems like the getProgress() has not been<= br> >> called at all;
>>
>> Regards,
>> P.
>>
>> On Tue, Mar 6, 2012 at 09:49, Jeremy Hanna <jeremy.hanna1234@gmail.com>
>> wrote:
>> > you may be running into this -
>> > https://issues.apache.org/jira/browse/CASSANDRA-3942<= /a> - I'm not sure if it
>> > really affects the execution of the job itself though.
>> >
>> > On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote:
>> >
>> >> Hi,
>> >>
>> >> I was recently trying Hadoop job + cassandra-all 0.8.10 a= gain and the
>> >> Timeouts I get are not because of the Cassandra can't= handle the
>> >> requests. I've noticed there are several tasks that s= how proggess of
>> >> several thousands percents. Seems like they are looping t= heir range of
>> >> keys. I've run the job with debug enabled and the ran= ges look ok, see
>> >>
http://pastebin.com/stVsFzLM
>> >>
>> >> Another difference between cassandra-all 0.8.7 and 0.8.10= is the
>> >> number of mappers the job creates:
>> >> 0.8.7: 4680
>> >> 0.8.10: 595
>> >>
>> >> Task =A0 =A0 =A0 Complete
>> >> task_201202281457_2027_m_000041 =A0 =A0 =A0 9076.81%
>> >> task_201202281457_2027_m_000073 =A0 =A0 =A0 9639.04%
>> >> task_201202281457_2027_m_000105 =A0 =A0 =A0 10538.60%
>> >> task_201202281457_2027_m_000108 =A0 =A0 =A0 9364.17%
>> >>
>> >> None of this happens with cassandra-all 0.8.7.
>> >>
>> >> Regards,
>> >> P.
>> >>
>> >>
>> >>
>> >> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto
>> >> <patrik.mo= desto@gmail.com> wrote:
>> >>> I'll alter these settings and will let you know.<= br> >> >>>
>> >>> Regards,
>> >>> P.
>> >>>
>> >>> On Tue, Feb 28, 2012 at 09:23, aaron morton <aaron@thelastpickle.com>
>> >>> wrote:
>> >>>> Have you tried lowering the =A0batch size and inc= reasing the time out?
>> >>>> Even
>> >>>> just to get it to work.
>> >>>>
>> >>>> If you get a TimedOutException it means CL number= of servers did not
>> >>>> respond
>> >>>> in time.
>> >>>>
>> >>>> Cheers
>> >>>>
>> >>>> -----------------
>> >>>> Aaron Morton
>> >>>> Freelance Developer
>> >>>> @aaronmorton
>> >>>> http://www.thelastpickle.com
>> >>>>
>> >>>> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:<= br> >> >>>>
>> >>>> Hi aaron,
>> >>>>
>> >>>> this is our current settings:
>> >>>>
>> >>>> =A0 =A0 =A0<property>
>> >>>> =A0 =A0 =A0 =A0 =A0<name>cassandra.range.ba= tch.size</name>
>> >>>> =A0 =A0 =A0 =A0 =A0<value>1024</value>= ;
>> >>>> =A0 =A0 =A0</property>
>> >>>>
>> >>>> =A0 =A0 =A0<property>
>> >>>> =A0 =A0 =A0 =A0 =A0<name>cassandra.input.sp= lit.size</name>
>> >>>> =A0 =A0 =A0 =A0 =A0<value>16384</value&g= t;
>> >>>> =A0 =A0 =A0</property>
>> >>>>
>> >>>> rpc_timeout_in_ms: 30000
>> >>>>
>> >>>> Regards,
>> >>>> P.
>> >>>>
>> >>>> On Mon, Feb 27, 2012 at 21:54, aaron morton <<= a href=3D"mailto:aaron@thelastpickle.com">aaron@thelastpickle.com> >> >>>> wrote:
>> >>>>
>> >>>> What settings do you have for cassandra.range.bat= ch.size
>> >>>>
>> >>>> and rpc_timeout_in_ms =A0? Have you tried reducin= g the first and/or
>> >>>> increasing
>> >>>>
>> >>>> the second ?
>> >>>>
>> >>>>
>> >>>> Cheers
>> >>>>
>> >>>>
>> >>>> -----------------
>> >>>>
>> >>>> Aaron Morton
>> >>>>
>> >>>> Freelance Developer
>> >>>>
>> >>>> @aaronmorton
>> >>>>
>> >>>> http://www.thelastpickle.com
>> >>>>
>> >>>>
>> >>>> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:<= br> >> >>>>
>> >>>>
>> >>>> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo >> >>>> <edli= nuxguru@gmail.com>
>> >>>>
>> >>>> wrote:
>> >>>>
>> >>>>
>> >>>> Did you see the notes here?
>> >>>>
>> >>>>
>> >>>>
>> >>>> I'm not sure what do you mean by the notes? >> >>>>
>> >>>>
>> >>>> I'm using the mapred.* settings suggested the= re:
>> >>>>
>> >>>>
>> >>>> =A0 =A0 <property>
>> >>>>
>> >>>> =A0 =A0 =A0 =A0 <name>mapred.max.tracker.fa= ilures</name>
>> >>>>
>> >>>> =A0 =A0 =A0 =A0 <value>20</value>
>> >>>>
>> >>>> =A0 =A0 </property>
>> >>>>
>> >>>> =A0 =A0 <property>
>> >>>>
>> >>>> =A0 =A0 =A0 =A0 <name>mapred.map.max.attemp= ts</name>
>> >>>>
>> >>>> =A0 =A0 =A0 =A0 <value>20</value>
>> >>>>
>> >>>> =A0 =A0 </property>
>> >>>>
>> >>>> =A0 =A0 <property>
>> >>>>
>> >>>> =A0 =A0 =A0 =A0 <name>mapred.reduce.max.att= empts</name>
>> >>>>
>> >>>> =A0 =A0 =A0 =A0 <value>20</value>
>> >>>>
>> >>>> =A0 =A0 </property>
>> >>>>
>> >>>>
>> >>>> But I still see the timeouts that I haven't w= ith cassandra-all 0.8.7.
>> >>>>
>> >>>>
>> >>>> P.
>> >>>>
>> >>>>
>> >>>> http://wiki.apache.org/cassandr= a/HadoopSupport#Troubleshooting
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >
>
>

--f46d044784a731821a04baa44151--