hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Collins <...@cloudera.com>
Subject MR1 next steps
Date Thu, 07 Jul 2011 16:58:19 GMT
Hey gang,

Had some discussion about what to do with MR1 with Arun at the summit,
wanted to move it on-list.. Was thinking we should sort these out some
on mr-dev before discussing/announcing a decision on general.

The question is, now that we'll soon have MR2 merged (hurray!), to
what extent do we ant to support MR1?  By MR1 I mean the JT and TT,
not the old MR API, which MR2 supports. Ie this isn't about job API
compatibility it's about implementation compatibility (eg existing
systems which may depend on JT/TT interfaces like metrics). Here are
the options as I see them:

1. Do nothing. MR1 will continue to be a regression, both in terms of
features and stability, against the MR in 203. Eg, MR1 in trunk still
doesn't support security. We would continue to recommend people use
MR1 from 20 (and MR2 from 23). Unclear what the value of having MR1 in
trunk in this shape is.

2. Remove the MR1 code from trunk/23, and just support MR2 in 23.
People who want MR1 can use the current stable release (which, per
option 1, we would recommend even if we left the code in as is).

3. Get MR1 in trunk in shape comparable to MR in 203. This preserves
the additional changes (to JT/TT at least) that have been added in
trunk since 0.20. Not clear if anyone would want to invest the
considerable effort this would take given that we have MR2 now (and
existing releases).

4. Put the MR1 code from 203 into trunk. This overwrites the changes
added to trunk not in 203, and would require some integration, however
it would give us a solid MR1 implementation that could be used in the
same release as MR2. It would be an incompatible change wrt 21/22,
however would be compatible in the sense that there are now both valid
MR1 and MR2 options in a single release.

I think #2 makes the most sense. From a developer perspective, MR2 is
good stuff, there's no need for us to maintain two implementations in
trunk/23 since we're already maintaining MR1 in the current releases.
I'm skeptical that anyone would volunteer to do #3 (lot of work,
unclear gain) or #4 (we already maintain MR1 elsewhere).  This allows
us to focus energy on MR2 instead of investing in MR1 (eg MR-2178,
which hasn't made much progress for ages).  From a user perspective,
MR2 preserves Job compatibility, so it should just programs that talk
to the JT/TT that are affected. MR2 is a little harder to run
out-of-the-box, however we can fix that and we don't recommend people
use MR1 from 21/22/trunk anyway.

Thoughts?

Thanks,
Eli

Mime
View raw message