Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A485610653 for ; Fri, 7 Jun 2013 03:16:21 +0000 (UTC) Received: (qmail 14327 invoked by uid 500); 7 Jun 2013 03:16:13 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 14188 invoked by uid 500); 7 Jun 2013 03:16:10 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 14179 invoked by uid 99); 7 Jun 2013 03:16:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Jun 2013 03:16:09 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of samliuhadoop@gmail.com designates 209.85.215.176 as permitted sender) Received: from [209.85.215.176] (HELO mail-ea0-f176.google.com) (209.85.215.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Jun 2013 03:16:04 +0000 Received: by mail-ea0-f176.google.com with SMTP id z15so229691ead.35 for ; Thu, 06 Jun 2013 20:15:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Wa/qu4sNl1DDPvc1H6H7sWTb3YJO3+h/XK0/2hxcfMw=; b=BqSWJi8mQdvlYdMYPPra4KHNanWpj2vo9PObHT29TD8HN7Ql39q7cXFfLtww+B0ElC lpbeQNs+xuXPnhFsh1VEF7cez7a35qxx2nUsypOi20BeNWCWYz0lQYJU13s7GCrVtdGF Cxe/wirW2qlDTSsigkaUD+QQ12Ilf0R6vhbKw5vJL+t50JDlkmHOxHVHsXq53ThPOfd7 +93J/agMRLXmZ4yyfnJy7/NR1Uhh15I74Dl763rqp/o/vmoSb4jR9RHliGNdBQY+sOCO eOvP66+PTOU0oM6aO8EvCdybw8QYoQh+6DtNL35K6ktsTZtyKCQJPm0jKfIsNiIElWc9 62jQ== MIME-Version: 1.0 X-Received: by 10.14.111.129 with SMTP id w1mr36881254eeg.13.1370574943393; Thu, 06 Jun 2013 20:15:43 -0700 (PDT) Received: by 10.14.214.2 with HTTP; Thu, 6 Jun 2013 20:15:43 -0700 (PDT) In-Reply-To: References: Date: Fri, 7 Jun 2013 11:15:43 +0800 Message-ID: Subject: Re: Why my tests shows Yarn is worse than MRv1 for terasort? From: sam liu To: Marcos Luis Ortiz Valmaseda Cc: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=089e01634652a3ba8f04de87d873 X-Virus-Checked: Checked by ClamAV on apache.org --089e01634652a3ba8f04de87d873 Content-Type: text/plain; charset=ISO-8859-1 At the begining, I just want to do a fast comparision of MRv1 and Yarn. But they have many differences, and to be fair for comparison I did not tune their configurations at all. So I got above test results. After analyzing the test result, no doubt, I will configure them and do comparison again. Do you have any idea on current test result? I think, to compare with MRv1, Yarn is better on Map phase(teragen test), but worse on Reduce phase(terasort test). And any detailed suggestions/comments/materials on Yarn performance tunning? Thanks! 2013/6/7 Marcos Luis Ortiz Valmaseda > Why not to tune the configurations? > Both frameworks have many areas to tune: > - Combiners, Shuffle optimization, Block size, etc > > > > 2013/6/6 sam liu > >> Hi Experts, >> >> We are thinking about whether to use Yarn or not in the near future, and >> I ran teragen/terasort on Yarn and MRv1 for comprison. >> >> My env is three nodes cluster, and each node has similar hardware: 2 >> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on the same env. To >> be fair, I did not make any performance tuning on their configurations, but >> use the default configuration values. >> >> Before testing, I think Yarn will be much better than MRv1, if they all >> use default configuration, because Yarn is a better framework than MRv1. >> However, the test result shows some differences: >> >> MRv1: Hadoop-1.1.1 >> Yarn: Hadoop-2.0.4 >> >> (A) Teragen: generate 10 GB data: >> - MRv1: 193 sec >> - Yarn: 69 sec >> *Yarn is 2.8 times better than MRv1* >> >> (B) Terasort: sort 10 GB data: >> - MRv1: 451 sec >> - Yarn: 1136 sec >> *Yarn is 2.5 times worse than MRv1* >> >> After a fast analysis, I think the direct cause might be that Yarn is >> much faster than MRv1 on Map phase, but much worse on Reduce phase. >> >> Here I have two questions: >> *- Why my tests shows Yarn is worse than MRv1 for terasort? >> * >> *- What's the stratage for tuning Yarn performance? Is any materials?* >> >> Thanks! >> > > > > -- > Marcos Ortiz Valmaseda > Product Manager at PDVSA > http://about.me/marcosortiz > > --089e01634652a3ba8f04de87d873 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
At the begining, I just want to do a fast c= omparision of MRv1 and Yarn. But they have many differences, and to be fair= for comparison I did not tune their configurations at all.=A0 So I got abo= ve test results. After analyzing the test result, no doubt, I will configur= e them and do comparison again.

Do you have any idea on current test result? I think, to compare = with MRv1, Yarn is better on Map phase(teragen test), but worse on Reduce p= hase(terasort test).
And any detailed suggestions/comments/materia= ls on Yarn performance tunning?

Thanks!


2013/6/7 Marcos Luis Ortiz Valmaseda <marcosluis2186@= gmail.com>
Why not to tune t= he configurations?
Both frameworks have many areas to tune:
- Combiners, Shuffle optimization, Block size, etc



2013/6/6 sam liu <samliuhadoop@gmail.com>
Hi Exper= ts,

We are thinking about whether to use Yarn or not in the ne= ar future, and I ran teragen/terasort on Yarn and MRv1 for comprison.

My env is three nodes cluster, and each node has similar=20 hardware: 2 cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on=20 the same env. To be fair, I did not make any performance tuning on their configurations, but use the default configuration values.

Before testing, I think Yarn will be much better than MRv1, if they=20 all use default configuration, because Yarn is a better framework than=20 MRv1. However, the test result shows some differences:

MR= v1: Hadoop-1.1.1
Yarn: Hadoop-2.0.4

(A) Teragen: generate= 10 GB data:
- MRv1: 193 sec
- Yarn: 69 sec
Y= arn is 2.8 times better than MRv1

(B) Terasort: sort 10 GB= data:
- MRv1: 451 sec
- Yarn: 1136 sec
Yarn is 2.5 tim= es worse than MRv1

After a fast analysis, I think the direct cause might be that Yarn is much=20 faster than MRv1 on Map phase, but much worse on Reduce phase.

Here I have two questions:
- Why my tests s= hows Yarn is worse than MRv1 for terasort?
- What'= s the stratage for tuning Yarn performance? Is any materials?

Thanks!



--
Marcos Ortiz Valmaseda
= Product Manager at PDVSA
http://about.me/m= arcosortiz


--089e01634652a3ba8f04de87d873--