Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F24F8DBE4 for ; Mon, 10 Dec 2012 19:24:26 +0000 (UTC) Received: (qmail 42724 invoked by uid 500); 10 Dec 2012 19:24:22 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 42538 invoked by uid 500); 10 Dec 2012 19:24:22 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 42531 invoked by uid 99); 10 Dec 2012 19:24:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Dec 2012 19:24:22 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of adi@cloudera.com designates 209.85.210.175 as permitted sender) Received: from [209.85.210.175] (HELO mail-ia0-f175.google.com) (209.85.210.175) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Dec 2012 19:24:15 +0000 Received: by mail-ia0-f175.google.com with SMTP id z3so4736348iad.6 for ; Mon, 10 Dec 2012 11:23:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=CzcggsGnoxBDDA72oY9kjGbv0iZFy2eecDpE2CLxhCw=; b=EM8pctbyeXaDwjZ2Z0qreRnj/4gn+I9fX8+oLZc1g6KlR3PlarYlNGuUZ/dxomNi6z QYJmhRnqPHfdZsHAmy3F2LrkAQgvXC8r/jAYe9Wu96T6lbHPUc+eh2juPU8G3e/K4SKN SfVL1k50I2mTb2RxZMzCl6mvwGP2neoHcEyKm72wqNVKc+5XEVQ0I1bDSg8Sy2IOBik0 CSyNGeIQpKq5it7CYNlPFGblWpgO9cCcOoYUFEzDCwXxviWqiMGONG9Pi8pzhDzI5080 kMclzbl+YIliLq2FIQnCWL75dbvjQurATk1ZRLlPOdCJ5Fw9D2CTE7m7QIrbYfc1oM4i zrvg== MIME-Version: 1.0 Received: by 10.50.7.232 with SMTP id m8mr7581004iga.48.1355167434542; Mon, 10 Dec 2012 11:23:54 -0800 (PST) Received: by 10.64.38.230 with HTTP; Mon, 10 Dec 2012 11:23:54 -0800 (PST) In-Reply-To: References: Date: Mon, 10 Dec 2012 11:23:54 -0800 Message-ID: Subject: Re: Strange machine behavior From: Andy Isaacson To: user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQlvCBWoOje4bK5enW8SgDpzO/aN9y+koxat9DTNgACEu2kBAhBdiKdu3z7eY2wXQzNWH6gf X-Virus-Checked: Checked by ClamAV on apache.org What kernel did you see this on? Was there significant swap traffic (si/so in vmstat output) during the high-system-time period? BTW, you don't need to nor do you want to run sync(1) when manipulating drop_caches, it just causes additional noise and slowdown. drop_caches doesn't have any impact on correctness; it won't cause data loss (by dropping a dirty page or whatever). I've had sync calls take 10 minutes to complete, so the unnecessary impact can be significant. -andy On Sat, Dec 8, 2012 at 4:09 PM, Robert Dyer wrote: > Has anyone experienced a TaskTracker/DataNode behaving like the attached > image? > > This was during a MR job (which runs often). Note the extremely high System > CPU time. Upon investigating I saw that out of 64GB ram the system had > allocated almost 45GB to cache! > > I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync" which is > roughly where the graph goes back to normal (much lower System, much higher > User). > > This has happened a few times. > > I have tried playing with the sysctl vm.swappiness value (default of 60) by > setting it to 30 (which it was at when the graph was collected) and now to > 10. I am not sure that helps. > > Any ideas? Anyone else run into this before? > > 24 cores > 64GB ram > 4x2TB sata3 hdd > > Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb heap) on > this machine. > > 24 map slots (1gb heap each), no reducers. > > Also running HBase 0.94.2 with a RS (8gb ram) on this machine.