hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Hadoop optimization for Lustre FS
Date Wed, 16 May 2012 09:54:50 GMT

http://wiki.apache.org/hadoop/HowToContribute is a wiki that can tell you in more detail the
steps you need to do for this. In general though to push the patch upstream you want to file
a Map/Reduce JIRA, and attach your patch.  After that several people from the community are
likely to comment on the JIRA.  If you don't get feedback you can bug us on the dev mailing
list about it.  As part of this you are also going to need to do a port to trunk, as we do
not want to have new features go into any line without having it go into trunk as well.  Even
though this sounds potentially complex because trunk uses YARN instead of the previous Map/Reduce
specific framework both 1.0 and trunk are in the process of getting a pluggable shuffle service
MAPREDUCE-4049.  It would probably be best to port your patch to be a plugin for this.  Then
hopefully the porting between trunk and 1.0 will be relatively simple.

If this is the route you want to go you should put 1.1 and 3.0.0 as the target versions of
the JIRA.  3.0.0 corresponds to trunk, and 1.1 is the next release of the 1 line that is accepting
new major feature work.  You probably also want to link your JIRA to the MAPREDUCE-4049 JIRA
as a dependency, if you are making it a plugin.

In addition because this is an optimization it would be nice to have some information in the
JIRA showing the benchmarks you ran and the performance improvements you got.  Ultimately
we are also going to want to have some documentation about this as well, but that is something
that can come later after you lock down the code more.

--Bobby Evans

On 5/16/12 3:34 AM, "Alexander Zarochentsev" <alexander_zarochentsev@xyratex.com> wrote:


there is an optimization for Hadoop on Lustre FS, or any
high-performance distributed filesystem.

The research paper with test results can be found here
and a presentation for LUG 2011:

Basically the optimization is a replacement for http transport in
shuffle phase by simple linking target file to the source one. I
attached a draft patch against hadoop-1.0.0 to illustrate the idea.
How to push this patch upstream?


Alexander "Zam" Zarochentsev

This email may contain privileged or confidential information, which should only be used for
the purpose for which it was sent by Xyratex. No further rights or licenses are granted to
use such information. If you are not the intended recipient of this message, please notify
the sender by return and delete it. You may not use, copy, disclose or rely on the information
contained in it.

Internet email is susceptible to data corruption, interception and unauthorised amendment
for which Xyratex does not accept liability. While we have taken reasonable precautions to
ensure that this email is free of viruses, Xyratex does not accept liability for the presence
of any computer viruses in this email, nor for any losses caused as a result of viruses.

Xyratex Technology Limited (03134912), Registered in England & Wales, Registered Office,
Langstone Road, Havant, Hampshire, PO9 1SA.

The Xyratex group of companies also includes, Xyratex Ltd, registered in Bermuda, Xyratex
International Inc, registered in California, Xyratex (Malaysia) Sdn Bhd registered in Malaysia,
Xyratex Technology (Wuxi) Co Ltd registered in The People's Republic of China and Xyratex
Japan Limited registered in Japan.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message