hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcos Luis Ortiz Valmaseda <marcosluis2...@gmail.com>
Subject Re: Why my tests shows Yarn is worse than MRv1 for terasort?
Date Fri, 07 Jun 2013 03:26:12 GMT
I´m a not an expert tuning YARN, but you can try Terasort, doing something
similar with MRv1 and YARN.
I thnik that Arun and their team could be a very good help for it.
Some links?

It would be nice that if you do this, share your results in a blog post or
in a research article, to spread the word about your findings.

Best wishes.

2013/6/6 sam liu <samliuhadoop@gmail.com>

> At the begining, I just want to do a fast comparision of MRv1 and Yarn.
> But they have many differences, and to be fair for comparison I did not
> tune their configurations at all.  So I got above test results. After
> analyzing the test result, no doubt, I will configure them and do
> comparison again.
> Do you have any idea on current test result? I think, to compare with
> MRv1, Yarn is better on Map phase(teragen test), but worse on Reduce
> phase(terasort test).
> And any detailed suggestions/comments/materials on Yarn performance
> tunning?
> Thanks!
> 2013/6/7 Marcos Luis Ortiz Valmaseda <marcosluis2186@gmail.com>
>> Why not to tune the configurations?
>> Both frameworks have many areas to tune:
>> - Combiners, Shuffle optimization, Block size, etc
>> 2013/6/6 sam liu <samliuhadoop@gmail.com>
>>> Hi Experts,
>>> We are thinking about whether to use Yarn or not in the near future, and
>>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>> My env is three nodes cluster, and each node has similar hardware: 2
>>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on the same env. To
>>> be fair, I did not make any performance tuning on their configurations, but
>>> use the default configuration values.
>>> Before testing, I think Yarn will be much better than MRv1, if they all
>>> use default configuration, because Yarn is a better framework than MRv1.
>>> However, the test result shows some differences:
>>> MRv1: Hadoop-1.1.1
>>> Yarn: Hadoop-2.0.4
>>> (A) Teragen: generate 10 GB data:
>>> - MRv1: 193 sec
>>> - Yarn: 69 sec
>>> *Yarn is 2.8 times better than MRv1*
>>> (B) Terasort: sort 10 GB data:
>>> - MRv1: 451 sec
>>> - Yarn: 1136 sec
>>> *Yarn is 2.5 times worse than MRv1*
>>> After a fast analysis, I think the direct cause might be that Yarn is
>>> much faster than MRv1 on Map phase, but much worse on Reduce phase.
>>> Here I have two questions:
>>> *- Why my tests shows Yarn is worse than MRv1 for terasort?
>>> *
>>> *- What's the stratage for tuning Yarn performance? Is any materials?*
>>> Thanks!
>> --
>> Marcos Ortiz Valmaseda
>> Product Manager at PDVSA
>> http://about.me/marcosortiz

Marcos Ortiz Valmaseda
Product Manager at PDVSA

View raw message