Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E061510722 for ; Thu, 10 Oct 2013 20:51:22 +0000 (UTC) Received: (qmail 98186 invoked by uid 500); 10 Oct 2013 20:51:16 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 98019 invoked by uid 500); 10 Oct 2013 20:51:15 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 98011 invoked by uid 99); 10 Oct 2013 20:51:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Oct 2013 20:51:14 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of secsubs@gmail.com designates 209.85.220.172 as permitted sender) Received: from [209.85.220.172] (HELO mail-vc0-f172.google.com) (209.85.220.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Oct 2013 20:51:08 +0000 Received: by mail-vc0-f172.google.com with SMTP id hu8so2155180vcb.3 for ; Thu, 10 Oct 2013 13:50:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=lwEFUqF4NScLiORxnnlzIWGFm6IElKAuTIvElZ1OUGI=; b=WKOq+7yRsbb99Uhja9vHmkwbovvfTXZ+mbXPEW/kRyTryKI7EWA0PysFGKK8TkekGa 1nehDW9RhdFjE5Qq9BDa4/MdEmfbtkffScZ59o04Fr90qIbMa9tO8Kw6MwN4vLtSk19x fCE9eoLjniLFbeOouK2+a7z5qEcw5MC3+FlGU0tu9+6SGHyH3KpvFAbeoz/xinQ1BjGZ mxkDOMP9e5JtmvmBhpuIzOaU5JT/PZbpvc/UtArc+9icY2zsaTZB6QgQrGrCwzKXixUc fI3jG7q7kRqdBgGsMwKgKskOMOZlARhFPnJjgehRArZHnx9UDFw0LLTXZ6R6JwCUJzhB 23Cg== MIME-Version: 1.0 X-Received: by 10.220.182.137 with SMTP id cc9mr11013001vcb.62.1381438246945; Thu, 10 Oct 2013 13:50:46 -0700 (PDT) Received: by 10.221.57.129 with HTTP; Thu, 10 Oct 2013 13:50:46 -0700 (PDT) In-Reply-To: References: Date: Thu, 10 Oct 2013 13:50:46 -0700 Message-ID: Subject: Re: Improving MR job disk IO From: Xuri Nagarin To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e0149c3cefd0a8c04e8692749 X-Virus-Checked: Checked by ClamAV on apache.org --089e0149c3cefd0a8c04e8692749 Content-Type: text/plain; charset=ISO-8859-1 On Thu, Oct 10, 2013 at 1:27 PM, Pradeep Gollakota wrote: > I don't think it necessarily means that the job is a bad candidate for MR. > It's a different type of a workload. Hortonworks has a great article on the > different types of workloads you might see and how that affects your > provisioning choices at > http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_cluster-planning-guide/content/ch_hardware-recommendations.html > One statement that stood out to me in the link above is "For these reasons, Hortonworks recommends that you either use the Balanced workload configuration or invest in a pilot Hadoop cluster and plan to evolve as you analyze the workload patterns in your environment." Now, this is not a critique/concern of HW but rather of hadoop. Well, what if my workloads can be both CPU and IO intensive? Do I take the approach of throw-enough-excess-hardware-just-in-case? > > I have not looked at the Grep code so I'm not sure why it's behaving the > way it is. Still curious that streaming has a higher IO throughput and > lower CPU usage. It may have to do with the fact that /bin/grep is a native > implementation and Grep (Hadoop) is probably using Java Pattern/Matcher api. > The Grep code is from the bundled examples in CDH. I made one line modification for it to read Sequence files. The streaming job probably does not have lower CPU utilization but I see that it does even out the CPU utilization among all the available processors. I guess the native grep binary threads better than the java MR job? Which brings me to ask - If you have the mapper/reducer functionality built into a platform specific binary, then won't it always be more efficient than a java MR job? And, in such cases, am I better off with streaming than Java MR? Thanks for your responses. > > > On Thu, Oct 10, 2013 at 12:29 PM, Xuri Nagarin wrote: > >> Thanks Pradeep. Does it mean this job is a bad candidate for MR? >> >> Interestingly, running the cmdline '/bin/grep' under a streaming job >> provides (1) Much better disk throughput and, (2) CPU load is almost evenly >> spread across all cores/threads (no CPU gets pegged to 100%). >> >> >> >> >> On Thu, Oct 10, 2013 at 11:15 AM, Pradeep Gollakota > > wrote: >> >>> Actually... I believe that is expected behavior. Since your CPU is >>> pegged at 100% you're not going to be IO bound. Typically jobs tend to be >>> CPU bound or IO bound. If you're CPU bound you expect to see low IO >>> throughput. If you're IO bound, you expect to see low CPU usage. >>> >>> >>> On Thu, Oct 10, 2013 at 11:05 AM, Xuri Nagarin wrote: >>> >>>> Hi, >>>> >>>> I have a simple Grep job (from bundled examples) that I am running on a >>>> 11-node cluster. Each node is 2x8-core Intel Xeons (shows 32 CPUs with HT >>>> on), 64GB RAM and 8 x 1TB disks. I have mappers set to 20 per node. >>>> >>>> When I run the Grep job, I notice that CPU gets pegged to 100% on >>>> multiple cores but disk throughput remains a dismal 1-2 Mbytes/sec on a >>>> single disk on each node. So I guess, the cluster is poorly performing in >>>> terms of disk IO. Running Terasort, I see each disk puts out 25-35 >>>> Mbytes/sec with a total cluster throughput of above 1.5 Gbytes/sec. >>>> >>>> How do I go about re-configuring or re-writing the job to utilize >>>> maximum disk IO? >>>> >>>> TIA, >>>> >>>> Xuri >>>> >>>> >>>> >>> >> > --089e0149c3cefd0a8c04e8692749 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On Thu, Oct 10, 2013 at 1:27 PM, Pradeep Gollakota <pr= adeepg26@gmail.com> wrote:
I don't think it necessarily means th= at the job is a bad candidate for MR. It's a different type of a worklo= ad. Hortonworks has a great article on the different types of workloads you= might see and how that affects your provisioning choices at=A0http://do= cs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_cluster-planning-guide/co= ntent/ch_hardware-recommendations.html

One statement that stood out to me in the = link above is "For these reasons, Hortonworks recommends th= at you either use the Balanced workload configuration or invest in a pilot = Hadoop cluster and plan to evolve as you analyze the workload patterns in y= our environment."

Now, this is not a critiqu= e/concern of HW but rather of hadoop. Well, what if my workloads can be bot= h CPU and IO intensive? Do I take the approach of throw-enough-excess-hardw= are-just-in-case?
=A0

I have not looked at the Grep code so I'm not sure why i= t's behaving the way it is. Still curious that streaming has a higher I= O throughput and lower CPU usage. It may have to do with the fact that /bin= /grep is a native implementation and Grep (Hadoop) is probably using Java P= attern/Matcher api.

The Grep code is from the bundled ex= amples in CDH. I made one line modification for it to read Sequence files. = The streaming job probably does not have lower CPU utilization but I see th= at it does even out the CPU utilization among all the available processors.= I guess the native grep binary threads better than the java MR job?

Which brings me to ask - If you have the mapper/reducer= functionality built into a platform specific binary, then won't it alw= ays be more efficient than a java MR job? And, in such cases, am I better o= ff with streaming than Java MR?

Thanks for your responses.

=A0


=
On Thu, Oct 10, 2013 at 12:29 PM, Xuri Nagarin <= span dir=3D"ltr"><secsubs@gmail.com> wrote:
Thanks Pradeep. Does it mean this job is = a bad candidate for MR?

Interestingly, running the cmdline '/bin/grep' under= a streaming job provides (1) Much better disk throughput and, (2) CPU load= is almost evenly spread across all cores/threads (no CPU gets pegged to 10= 0%).



On Thu, Oct 10, 2013 at 11:15 AM, Pradeep = Gollakota <pradeepg26@gmail.com> wrote:
Actually... I believe that is expected be= havior. Since your CPU is pegged at 100% you're not going to be IO boun= d. Typically jobs tend to be CPU bound or IO bound. If you're CPU bound= you expect to see low IO throughput. If you're IO bound, you expect to= see low CPU usage.


On Thu, Oct 10, 2013 at 11:05 AM, Xuri Nagarin <secsubs@gmail.com&g= t; wrote:
Hi,

I have a simple Gr= ep job (from bundled examples) that I am running on a 11-node cluster. Each= node is 2x8-core Intel Xeons (shows 32 CPUs with HT on), 64GB RAM and 8 x = 1TB disks. I have mappers set to 20 per node.

When I run the Grep job, I notice that CPU gets pegged = to 100% on multiple cores but disk throughput remains a dismal 1-2 Mbytes/s= ec on a single disk on each node. So I guess, the cluster is poorly perform= ing in terms of disk IO. Running Terasort, I see each disk puts out 25-35 M= bytes/sec with a total cluster throughput of above 1.5 Gbytes/sec.=A0

How do I go about re-configuring or re-writing the job = to utilize maximum disk IO?

TIA,

Xuri






--089e0149c3cefd0a8c04e8692749--