Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2639DDAA5 for ; Tue, 11 Dec 2012 02:06:47 +0000 (UTC) Received: (qmail 46289 invoked by uid 500); 11 Dec 2012 02:06:42 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 46198 invoked by uid 500); 11 Dec 2012 02:06:41 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 46191 invoked by uid 99); 11 Dec 2012 02:06:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Dec 2012 02:06:41 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.138.91.180] (HELO nm20-vm4.bullet.mail.ne1.yahoo.com) (98.138.91.180) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Dec 2012 02:06:32 +0000 Received: from [98.138.90.51] by nm20.bullet.mail.ne1.yahoo.com with NNFMP; 11 Dec 2012 02:06:11 -0000 Received: from [98.138.89.168] by tm4.bullet.mail.ne1.yahoo.com with NNFMP; 11 Dec 2012 02:06:11 -0000 Received: from [127.0.0.1] by omp1024.mail.ne1.yahoo.com with NNFMP; 11 Dec 2012 02:06:11 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 239506.76801.bm@omp1024.mail.ne1.yahoo.com Received: (qmail 20109 invoked by uid 60001); 11 Dec 2012 02:06:11 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1355191571; bh=srUracegCCyyRnaEaqdpml8z3nq73RwCNVZI+o9YuLQ=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=I8syf92F/hTG+Cd36/iL9k1zWwG36V118ccpTn3gQZKQj/bJ4a7ffeQR/D+3C+g9LZteAU3jllyxfdsKCEboUeG8r86az/BCH6NwU4LHcvpyrQ8GZKs/ryratAFz0EWfJwM9gmHbx+sAflBp5GV7ZFP0BhYSpVPOqpi0yS6aTpU= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=jhCshqRhQ+o9rGnwiOq0R6hQEUVmOLmo26hCwnDM63/ZgPCrxOfM1z4dGLo1uE+xhnY6iEw1/hziJ7eP3rTk/GFOB7VbdxpofYue05TLdjJoUPgyb/anAhZ0x723ZjSl/VF5mJLtjlzLEiUt28cnn9Ik/aF1FKA5Q/EeJ+awD3o=; X-YMail-OSG: gXXg4E8VM1kJRa2XE6nkI0EZh5k9f.9Wgjbn.dv0GD1vvKJ 4n__8Vshpqt53Vk2wPK0EUMuh7eiGKCfY1YRVUVe_84BqUEIgWZE0WvIKV0B BllO6CW0n1B5IqF1V.Ls0HmKMB0Qftm2KAK_p22.WP9.AqhKpbLGw746tUnp l9TV.TEkmKtp4MCbpDJgIo8fl8Ln0Qp4s8GNpvUu027Mrptb6P25FXojqKWR XMtiMjEisTH3fYiyXdex6N4mZx0z4otHS3nOKYGYVpoGvGDAo2Yz9AO0cRX_ eDkF3MGO3ZAa740yuPxlirW_NFrpJYJBTX_8saSJq.JobVl44npth3NkTj6D F9qPPFjiXHfmtkF6LOPo0GOA4GASz5uIabanrrknjsgsjk2U7dP2t2256K4e rbAvjxEMzTX6QiYBxvN1bum.SQZLwmI3.aaT.xA2qv.7P01itio8O7K5wbsL FamI4wAFsaQHck27pvncNDJ2NzqvGsAJI54NCKFZr7awUUpemKG1P63HiMv8 xqe4_UxvEnDPU8Vhb4e0Pov8NPBo- Received: from [76.247.188.221] by web126002.mail.ne1.yahoo.com via HTTP; Mon, 10 Dec 2012 18:06:10 PST X-Rocket-MIMEInfo: 001.001,QXJlIHlvdSBzZWVpbmcgYW55IHBlcmZvcm1hbmNlIGltcGFjdCB3aXRoIHRoaXMgY2FjaGUgaW5jcmVhc2U_IEl0IGlzIG5vcm1hbCBpbiBsaW51eCBzeXN0ZW0gdG8gZ3JhYiBoaWdoIGNhY2hlIGxldmVsLiAKCgotQmhhcmF0aAoKCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCiBGcm9tOiBBbmR5IElzYWFjc29uIDxhZGlAY2xvdWRlcmEuY29tPgpUbzogdXNlckBoYWRvb3AuYXBhY2hlLm9yZyAKU2VudDogTW9uZGF5LCBEZWNlbWJlciAxMCwgMjAxMiAxMToyMyBBTQpTdWJqZWN0OiBSZTogU3QBMAEBAQE- X-Mailer: YahooMailWebService/0.8.128.478 References: Message-ID: <1355191570.78498.YahooMailNeo@web126002.mail.ne1.yahoo.com> Date: Mon, 10 Dec 2012 18:06:10 -0800 (PST) From: Bharath Mundlapudi Reply-To: Bharath Mundlapudi Subject: Re: Strange machine behavior To: "user@hadoop.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-1915680061-699254788-1355191570=:78498" X-Virus-Checked: Checked by ClamAV on apache.org ---1915680061-699254788-1355191570=:78498 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Are you seeing any performance impact with this cache increase? It is norma= l in linux system to grab high cache level. =0A=0A=0A-Bharath=0A=0A=0A_____= ___________________________=0A From: Andy Isaacson =0ATo:= user@hadoop.apache.org =0ASent: Monday, December 10, 2012 11:23 AM=0ASubje= ct: Re: Strange machine behavior=0A =0AWhat kernel did you see this on? Was= there significant swap traffic=0A(si/so in vmstat output) during the high-= system-time period?=0A=0ABTW, you don't need to nor do you want to run sync= (1) when=0Amanipulating drop_caches, it just causes additional noise and=0A= slowdown. drop_caches doesn't have any impact on correctness; it won't=0Aca= use data loss (by dropping a dirty page or whatever). I've had sync=0Acalls= take 10 minutes to complete, so the unnecessary impact can be=0Asignifican= t.=0A=0A-andy=0A=0AOn Sat, Dec 8, 2012 at 4:09 PM, Robert Dyer wrote:=0A> Has anyone experienced a TaskTracker/DataNode behaving l= ike the attached=0A> image?=0A>=0A> This was during a MR job (which runs of= ten).=A0 Note the extremely high System=0A> CPU time.=A0 Upon investigating= I saw that out of 64GB ram the system had=0A> allocated almost 45GB to cac= he!=0A>=0A> I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; s= ync" which is=0A> roughly where the graph goes back to normal (much lower S= ystem, much higher=0A> User).=0A>=0A> This has happened a few times.=0A>=0A= > I have tried playing with the sysctl vm.swappiness value (default of 60) = by=0A> setting it to 30 (which it was at when the graph was collected) and = now to=0A> 10.=A0 I am not sure that helps.=0A>=0A> Any ideas?=A0 Anyone el= se run into this before?=0A>=0A> 24 cores=0A> 64GB ram=0A> 4x2TB sata3 hdd= =0A>=0A> Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb= heap) on=0A> this machine.=0A>=0A> 24 map slots (1gb heap each), no reduce= rs.=0A>=0A> Also running HBase 0.94.2 with a RS (8gb ram) on this machine. ---1915680061-699254788-1355191570=:78498 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable
Are you se= eing any performance impact with this cache increase? It is normal in linux= system to grab high cache level.

-Bharath


From: Andy Isaacson <adi@cloudera.com><= br> To: user@hadoop.apache= .org
Sent: Monday, De= cember 10, 2012 11:23 AM
Subject: Re:= Strange machine behavior

=0AWhat kernel did you see= this on? Was there significant swap traffic
(si/so in vmstat output) du= ring the high-system-time period?

BTW, you don't need to nor do you = want to run sync(1) when
manipulating drop_caches, it just causes additi= onal noise and
slowdown. drop_caches doesn't have any impact on correctn= ess; it won't
cause data loss (by dropping a dirty page or whatever). I'= ve had sync
calls take 10 minutes to complete, so the unnecessary impact= can be
significant.

-andy

On Sat, Dec 8, 2012 at 4:09 PM,= Robert Dyer <rdyer@iastate.edu> wrote:
> Has anyone experie= nced a TaskTracker/DataNode behaving like the attached
> image?
&g= t;
> This was during a MR job (which runs often).  Note the extr= emely high System
> CPU time.  Upon investigating I saw that out= of 64GB ram the system had
> allocated almost 45GB to cache!
>
> I did a sudo sh -c "sync ; echo 3 &g= t; /proc/sys/vm/drop_cache ; sync" which is
> roughly where the graph= goes back to normal (much lower System, much higher
> User).
>=
> This has happened a few times.
>
> I have tried playin= g with the sysctl vm.swappiness value (default of 60) by
> setting it= to 30 (which it was at when the graph was collected) and now to
> 10= .  I am not sure that helps.
>
> Any ideas?  Anyone e= lse run into this before?
>
> 24 cores
> 64GB ram
>= 4x2TB sata3 hdd
>
> Running Hadoop 1.0.4, with a DataNode (2gb= heap), TaskTracker (2gb heap) on
> this machine.
>
> 24 = map slots (1gb heap each), no reducers.
>
> Also running HBase = 0.94.2 with a RS (8gb ram) on this machine.


---1915680061-699254788-1355191570=:78498--