Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 16FF0D42F for ; Tue, 18 Dec 2012 08:43:00 +0000 (UTC) Received: (qmail 63474 invoked by uid 500); 18 Dec 2012 08:42:53 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 63376 invoked by uid 500); 18 Dec 2012 08:42:52 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 63328 invoked by uid 99); 18 Dec 2012 08:42:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Dec 2012 08:42:50 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.138.90.72] (HELO nm9.bullet.mail.ne1.yahoo.com) (98.138.90.72) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Dec 2012 08:42:41 +0000 Received: from [98.138.90.48] by nm9.bullet.mail.ne1.yahoo.com with NNFMP; 18 Dec 2012 08:42:20 -0000 Received: from [98.138.86.157] by tm1.bullet.mail.ne1.yahoo.com with NNFMP; 18 Dec 2012 08:42:20 -0000 Received: from [127.0.0.1] by omp1015.mail.ne1.yahoo.com with NNFMP; 18 Dec 2012 08:42:20 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 94324.97905.bm@omp1015.mail.ne1.yahoo.com Received: (qmail 6042 invoked by uid 60001); 18 Dec 2012 08:42:20 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1355820140; bh=bRwLq6ahxGLlGZQjySEdZUqx3kAySFNXgeXQrABkBD4=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=1GBrAzFgZ9CFv5rZ6fDFNlpHLBMLre5JrM1AnFmFFDCiNYdu3LehPtIG4/bOPd6R+fIIjSXAxsk3evlgyZZeJuY9ucou7EDKF75ze2kGiQz72UZAlmj2KpLQw0jHPBfhRDjFs7+b++LAZhK+Fku1Yh2E0PVDadrCuoizoXtZXvI= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=YypFQAwyJCRxnkVj6c5BJSCn4N36jtOzLawhRQxTuSWoK/4ZHkZ8/dj+hryGtjpTLGKh7Eosj/jLAJBkCg6T4S0wcwi/LJjPHLtlemy5Kv3msodedkUqpo+brP4YvlD3qnV2TMIT3sCBHq69+FuXnnJyYGoyzt85azo89rj/BhA=; X-YMail-OSG: dpeBPTEVM1nRh93abtyRmsOk4rOcxsm0wARQbfADBrnCJrk MSRJTdZ_PcUFFSdaLf7nuTBCEy_Fs2m27Gc0TRipJ_8QSh6kKQxz2jdGw0mU 9ZlvDelR9o.kDnrTpKQbLIun60G9WKFCs0EyxO.RZ8pQmK0nDR.mYcyAz9qS bAOLoHQHaSf1_f566Eb3cCk4i89v.OU1mUBv06vUVO2qo2133O9ztJvxi1T0 Va.RgYKvftdwcqu4XkcWt43GIoQJ_Apvb01PyomFh2rjggy5YrkDZMsXDgvW CE7uEy6qQ2RrJREwGeeVi0HlS6cHHH9C7HCHWIELy2rIB8JFmQpcbKx05FNP PGwMCY2.tMG0Za.3UmEjzi_sE1iybtLfUUsHMtvuZOdIRLeyi85F1zkDPFPj rXu6ieZcmn.sJSv9ZFoeLzxpk04.G4qlLe2R0jJ8Mr_1SHbVs_.KT5THs9fp YbOA6xyDqgOVAVDXv9j9wbBkdd6QEUngBcJw4McfACzk.ftmajjXXGn8Fr.v clgR1hOjRDvBgxmdSNd0AdKK9Qxx2CZd48fSbxUVBQp4.sw-- Received: from [206.190.61.50] by web126001.mail.ne1.yahoo.com via HTTP; Tue, 18 Dec 2012 00:42:19 PST X-Rocket-MIMEInfo: 001.001,WW91IG1heSB3YW50IHRvIGNoZWNrIHdoeSBTeXN0ZW0gdGltZSBpcyBoaWdoLiBDaGVjayB5b3VyIHN5c3RlbSBjYWxsIHN0YXRzLiBUaGlzIHNob3VsZCBnaXZlIHlvdSBzb21lIGNsdWUuCgotQmhhcmF0aAoKCgoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KIEZyb206IFJvYmVydCBEeWVyIDxyZHllckBpYXN0YXRlLmVkdT4KVG86IHVzZXJAaGFkb29wLmFwYWNoZS5vcmc7IEJoYXJhdGggTXVuZGxhcHVkaSA8YmhhcmF0aHdvcmtAeWFob28uY29tPiAKU2VudDogTW9uZGF5LCBEZWNlbWJlciABMAEBAQE- X-Mailer: YahooMailWebService/0.8.129.483 References: <1355191570.78498.YahooMailNeo@web126002.mail.ne1.yahoo.com> Message-ID: <1355820139.2532.YahooMailNeo@web126001.mail.ne1.yahoo.com> Date: Tue, 18 Dec 2012 00:42:19 -0800 (PST) From: Bharath Mundlapudi Reply-To: Bharath Mundlapudi Subject: Re: Strange machine behavior To: "user@hadoop.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="28998604-407587738-1355820139=:2532" X-Virus-Checked: Checked by ClamAV on apache.org --28998604-407587738-1355820139=:2532 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable You may want to check why System time is high. Check your system call stats= . This should give you some clue.=0A=0A-Bharath=0A=0A=0A=0A=0A_____________= ___________________=0A From: Robert Dyer =0ATo: user@had= oop.apache.org; Bharath Mundlapudi =0ASent: Monday,= December 10, 2012 7:32 PM=0ASubject: Re: Strange machine behavior=0A =0A= =0AYes there is performance impact. =A0It should be visible from the graph = I attached. =A0Basically, the CPU is spending much more time on System and = the User time is lowered.=0A=0AWhen this happens (if I don't do a drop_cach= es in time) the MR job winds up taking significantly longer than usual.=0A= =0A=0A=0AOn Mon, Dec 10, 2012 at 8:06 PM, Bharath Mundlapudi wrote:=0A=0AAre you seeing any performance impact with this cache= increase? It is normal in linux system to grab high cache level. =0A>=0A>= =0A>=0A>-Bharath=0A>=0A>=0A>=0A>________________________________=0A> From: = Andy Isaacson =0A>To: user@hadoop.apache.org =0A>Sent: Mo= nday, December 10, 2012 11:23 AM=0A>Subject: Re: Strange machine behavior= =0A> =0A>=0A>What kernel did you see this on? Was there significant swap tr= affic=0A>(si/so in vmstat output) during the high-system-time period?=0A>= =0A>BTW, you don't need to nor do you want to run sync(1) when=0A>manipulat= ing drop_caches, it just causes additional noise and=0A>slowdown. drop_cach= es doesn't have any impact on correctness; it won't=0A>cause data loss (by = dropping a dirty page or whatever). I've had sync=0A>calls take 10 minutes = to complete, so the unnecessary impact can be=0A>significant.=0A>=0A>-andy= =0A>=0A>On Sat, Dec 8, 2012 at 4:09 PM, Robert Dyer wro= te:=0A>> Has anyone experienced a TaskTracker/DataNode behaving like the at= tached=0A>> image?=0A>>=0A>> This was during a MR job (which runs often).= =A0 Note the extremely high System=0A>> CPU time.=A0 Upon investigating I s= aw that out of 64GB ram the system had=0A>> allocated=0A almost 45GB to cac= he!=0A>>=0A>> I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ;= sync" which is=0A>> roughly where the graph goes back to normal (much lowe= r System, much higher=0A>> User).=0A>>=0A>> This has happened a few times.= =0A>>=0A>> I have tried playing with the sysctl vm.swappiness value (defaul= t of 60) by=0A>> setting it to 30 (which it was at when the graph was colle= cted) and now to=0A>> 10.=A0 I am not sure that helps.=0A>>=0A>> Any ideas?= =A0 Anyone else run into this before?=0A>>=0A>> 24 cores=0A>> 64GB ram=0A>>= 4x2TB sata3 hdd=0A>>=0A>> Running Hadoop 1.0.4, with a DataNode (2gb heap)= , TaskTracker (2gb heap) on=0A>> this machine.=0A>>=0A>> 24 map slots (1gb = heap each), no reducers.=0A>>=0A>> Also running HBase 0.94.2 with a RS (8gb= ram) on this machine.=0A>=0A>=0A>=0A=0A=0A-- =0A=0ARobert Dyer=0Ardyer@ias= tate.edu --28998604-407587738-1355820139=:2532 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable
You may want to check= why System time is high. Check your system call stats. This should give yo= u some clue.

-Bharath


=

From: Ro= bert Dyer <rdyer@iastate.edu>
To: user@hadoop.apache.org; Bharath Mundlapudi <bharathwor= k@yahoo.com>
Sent:= Monday, December 10, 2012 7:32 PM
Yes there is performance impact.  It should = be visible from the graph I attached.  Basically, the CPU is spending = much more time on System and the User time is lowered.

W= hen this happens (if I don't do a drop_caches in time) the MR job winds up = taking significantly longer than usual.
=0A


On Mon, Dec 10, 2012= at 8:06 PM, Bharath Mundlapudi <bharathwork@yahoo.com> wrote:
=0A
Are you seein= g any performance impact with this cache increase? It is normal in linux sy= stem to grab high cache level.
=0A

-Bharath

=0A
=
From: And= y Isaacson <adi@cloudera.com>
= =0A To: user@hadoop.apache.org
Sent: Monday, December 10, 2012=0A 11:23 AM Subject: Re: Strange mac= hine behavior

=0AWh= at kernel did you see this on? Was there significant swap traffic
(si/so= in vmstat output) during the high-system-time period?

BTW, you don'= t need to nor do you want to run sync(1) when
manipulating drop_caches, = it just causes additional noise and
=0Aslowdown. drop_caches doesn't hav= e any impact on correctness; it won't
cause data loss (by dropping a dir= ty page or whatever). I've had sync
calls take 10 minutes to complete, s= o the unnecessary impact can be
=0Asignificant.

-andy

On S= at, Dec 8, 2012 at 4:09 PM, Robert Dyer <rdyer@iastate.edu> wrote:
> Has anyone experienced a TaskTr= acker/DataNode behaving like the attached
=0A> image?
>
>= This was during a MR job (which runs often).  Note the extremely high= System
> CPU time.  Upon investigating I saw that out of 64GB r= am the system had
> allocated=0A almost 45GB to cache!
>
>= ; I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync" wh= ich is
> roughly where the graph goes back to normal (much lower Syst= em, much higher
> User).
=0A>
> This has happened a few t= imes.
>
> I have tried playing with the sysctl vm.swappiness va= lue (default of 60) by
> setting it to 30 (which it was at when the g= raph was collected) and now to
=0A> 10.  I am not sure that help= s.
>
> Any ideas?  Anyone else run into this before?
&g= t;
> 24 cores
> 64GB ram
> 4x2TB sata3 hdd
>
>= ; Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb heap) = on
=0A> this machine.
>
> 24 map slots (1gb heap each), n= o reducers.
>
> Also running HBase 0.94.2 with a RS (8gb ram) o= n this machine.


=0A



--

Robert Dyerrdyer@iastate.edu
=0A
=0A

--28998604-407587738-1355820139=:2532--