hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcos Ortiz <mlor...@uci.cu>
Subject Re: MapReduce scalability study
Date Thu, 22 May 2014 20:47:28 GMT

On Thursday, May 22, 2014 10:17:42 PM Sylvain Gault wrote:
> Hello,
> I'm new to this mailing list, so forgive me if I don't do everything
> right.
> I didn't know whether I should ask on this mailing list or on
> mapreduce-dev or on yarn-dev. So I'll just start there. ^^
> Short story: I'm looking for some paper(s) studying the scalability
> of Hadoop MapReduce. And I found this extremely difficult to find on
> google scholar. Do you have something worth citing in a PhD thesis?
> Long story: I'm writing my PhD thesis about MapReduce and when I talk
> about Hadoop I'd like to say "how much it scales". I heared two years
> ago some people say that "Yahoo! got it scale up to 4000 nodes and plan
> to try on 6000 nodes" or something like that. I also heared that
> YARN/MRv2 should scale better, but I don't plan to talk much about
> YARN/MRv2. So I'd take anything I could cite as a reference in my
> manuscript. :)
Hello, Sylvain.
One of the reason why the Hadoop dev team began to work in YARN is precisely 
looking for a more scalable and resourceful Hadoop system, so if you actually want to 
talk about Hadoop scalability, you should talk about YARN and MR2.

The paper is here:

and the related JIRA issues here:

You should talk with Arun C Murthy, Chief Architect at Hortonworks about all these 
topics. He could help you much more than I could.

Marcos Ortiz[1] (@marcosluis2186[2])
> Best regards,
> Sylvain Gault

[1] http://www.linkedin.com/in/mlortiz
[2] http://twitter.com/marcosluis2186
[3] http://about.me/marcosortiz

VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver
View raw message