Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8B0E510389 for ; Thu, 10 Oct 2013 19:29:54 +0000 (UTC) Received: (qmail 34615 invoked by uid 500); 10 Oct 2013 19:29:38 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 34510 invoked by uid 500); 10 Oct 2013 19:29:35 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 34503 invoked by uid 99); 10 Oct 2013 19:29:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Oct 2013 19:29:34 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of secsubs@gmail.com designates 209.85.212.42 as permitted sender) Received: from [209.85.212.42] (HELO mail-vb0-f42.google.com) (209.85.212.42) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Oct 2013 19:29:29 +0000 Received: by mail-vb0-f42.google.com with SMTP id e12so2015915vbg.15 for ; Thu, 10 Oct 2013 12:29:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=kMU6HROve4Zl5b6dtfj7+hKYDilNnPzwt49y/57S7SU=; b=V6fxL1YFhwtk0NtcW4XEA4pJjNKlbTGuqRWUwMM3vkXWWQ9LNcG7XRDiszOoO0kh6Q g6zEJ+390ZO5yyawpEbqX41QrjXjDwNAYdFzpnqYYTjBc/Sl3m8oOPJGdgJLntDKiOnu RZR6zTQn/H/k5cCLzGdCXE0o2ms0aomNXhf7vEpzPLDPyvan2j2lfGaVgBF109U3e6Pv N6SSMa1AJkRq/EHLcttcN/DO7d5NnTMTbgCAdiXU6xFMvkprhS1S1VNsmVWX5JTicnpp 7UPfvem8lYlqu55jhjihkTeFK4+dMHgSgoydpzau388YsNFPsaTi17KH1XZ6ywnO46pA 7/uQ== MIME-Version: 1.0 X-Received: by 10.52.108.161 with SMTP id hl1mr8800222vdb.62.1381433348445; Thu, 10 Oct 2013 12:29:08 -0700 (PDT) Received: by 10.221.57.129 with HTTP; Thu, 10 Oct 2013 12:29:08 -0700 (PDT) In-Reply-To: References: Date: Thu, 10 Oct 2013 12:29:08 -0700 Message-ID: Subject: Re: Improving MR job disk IO From: Xuri Nagarin To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=bcaec547c91703e13b04e8680495 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec547c91703e13b04e8680495 Content-Type: text/plain; charset=ISO-8859-1 Thanks Pradeep. Does it mean this job is a bad candidate for MR? Interestingly, running the cmdline '/bin/grep' under a streaming job provides (1) Much better disk throughput and, (2) CPU load is almost evenly spread across all cores/threads (no CPU gets pegged to 100%). On Thu, Oct 10, 2013 at 11:15 AM, Pradeep Gollakota wrote: > Actually... I believe that is expected behavior. Since your CPU is pegged > at 100% you're not going to be IO bound. Typically jobs tend to be CPU > bound or IO bound. If you're CPU bound you expect to see low IO throughput. > If you're IO bound, you expect to see low CPU usage. > > > On Thu, Oct 10, 2013 at 11:05 AM, Xuri Nagarin wrote: > >> Hi, >> >> I have a simple Grep job (from bundled examples) that I am running on a >> 11-node cluster. Each node is 2x8-core Intel Xeons (shows 32 CPUs with HT >> on), 64GB RAM and 8 x 1TB disks. I have mappers set to 20 per node. >> >> When I run the Grep job, I notice that CPU gets pegged to 100% on >> multiple cores but disk throughput remains a dismal 1-2 Mbytes/sec on a >> single disk on each node. So I guess, the cluster is poorly performing in >> terms of disk IO. Running Terasort, I see each disk puts out 25-35 >> Mbytes/sec with a total cluster throughput of above 1.5 Gbytes/sec. >> >> How do I go about re-configuring or re-writing the job to utilize maximum >> disk IO? >> >> TIA, >> >> Xuri >> >> >> > --bcaec547c91703e13b04e8680495 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks Pradeep. Does it mean this job is a bad candidate f= or MR?

Interestingly, running the cmdline '/bin/grep= ' under a streaming job provides (1) Much better disk throughput and, (= 2) CPU load is almost evenly spread across all cores/threads (no CPU gets p= egged to 100%).




On Thu, Oct 10, 2013 at 11:15 AM, Pradeep Gollakota = <pradeepg26@gmail.com> wrote:
Actually... I believe that = is expected behavior. Since your CPU is pegged at 100% you're not going= to be IO bound. Typically jobs tend to be CPU bound or IO bound. If you= 9;re CPU bound you expect to see low IO throughput. If you're IO bound,= you expect to see low CPU usage.

On Thu, Oct 10, 2013 at 11:05 AM, Xuri Nag= arin <secsubs@gmail.com> wrote:
Hi,

I ha= ve a simple Grep job (from bundled examples) that I am running on a 11-node= cluster. Each node is 2x8-core Intel Xeons (shows 32 CPUs with HT on), 64G= B RAM and 8 x 1TB disks. I have mappers set to 20 per node.

When I run the Grep job, I notice that CPU gets pegged = to 100% on multiple cores but disk throughput remains a dismal 1-2 Mbytes/s= ec on a single disk on each node. So I guess, the cluster is poorly perform= ing in terms of disk IO. Running Terasort, I see each disk puts out 25-35 M= bytes/sec with a total cluster throughput of above 1.5 Gbytes/sec.=A0

How do I go about re-configuring or re-writing the job = to utilize maximum disk IO?

TIA,

Xuri




--bcaec547c91703e13b04e8680495--