hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric <eric.x...@gmail.com>
Subject Re: Which version to choose
Date Wed, 22 Dec 2010 20:32:17 GMT
I appreciate the insightful comments Todd. I now understand that 0.21 is not
a production release and never will be. That makes me much more confident to
keep working with the CDH3 version. It's difficult to get started with
Hadoop because information is so scattered. The fact that the libraries are
deprecated is confusing too. Someone should write this down for newcomers:
use the old libraries, they are deprecated but are the best choice for now
since they are complete and well tested.

2010/12/22 Todd Lipcon <todd@cloudera.com>

> Hi Eric,
>
> Some thoughts inline below:
>
> On Wed, Dec 22, 2010 at 3:39 AM, Eric <eric.xkcd@gmail.com> wrote:
>
>> This question may have been asked numerous times, and the answer will
>> probably come down to the specific situation you are in, but I'm going to
>> ask anyway:
>>
>> Which Hadoop version should I pick?
>>
>> I'm currently running Cloudera's CDH3 beta release, but I'm very tempted
>> to install the latest Apache 0.21 version instead.
>>
>> Problems I encountered are:
>> * Cloudera's distribution has bugs, like pid file directories that
>> disappear after a reboot (because it's a memory disk).
>>
>
> I think you'll find that all software has bugs - that's the nature of
> software :) We try to fix bugs as they're reported, and if you're having
> issues that you believe to be Cloudera-specific bugs please file them on
> issues.cloudera.org or the cdh-user mailing list, and we'll take a look.
> As you said above, CDH3 is still in a beta release, so we expect some bugs
> and are working hard to iron them out.
>
> Of course the Apache release and any other release has bugs, too!
>
>
>> * I'm writing code against deprecated libraries :-( The new libraries are
>> not yet complete in release 0.20.x.
>>
>>
> In fact many of the backports in CDH3 are "new-API" versions of various
> input formats, partitioners, etc, backported from Apache trunk.
>
> Keep in mind that the "old API" though marked deprecated in 0.20 is not
> really deprecated. It will continue to be supported in 0.22 and probably
> 0.23 as well. There are just too many existing jobs written against these
> APIs to actually get rid of them without a multi-year transition plan, for
> better or for worse. In 0.21 we actually un-deprecated them!
>
>
>> I'm not (yet) running a production cluster, but I'm planning on turning it
>> into a production cluster in a few months. I do not feel confortable writing
>> code against deprecated libraries, but I also don't feel confortable
>> installing a Hadoop release that is not well tested and declared stable. If
>> I am experimenting now so changes are that 0.21 will become stable over the
>> coming months and will be a stable release once I go into production.
>>
>
> I don't think anyone plans on maintaining the 0.21 branch (please correct
> me if I'm wrong). As it says in the release notes, it was an unstable
> release made mostly to make sure we could still release after the project
> split. The next stable release will likely be 0.22 or 0.23.
>
> If you'd like to help with development, please do run the 0.21 release and
> help us fix bugs. Those bug fixes will then end up in our next stable
> release and everyone will be the better for it! If you're a more typical
> user, you're probably better off sticking with the more stable 0.20 release.
>
>
>> If I may ask, what are you running? I can imagine large companies are not
>> running the lastest version of Hadoop and/or HBase.
>>
>
> I know of no one running Hadoop 0.21 for a serious workload. Most companies
> are running 0.20.2 or one of the branches based on this release (including
> CDH or the git repos available from Yahoo or Facebook).
>
>
>> Or am I wrong? Are you guys patching old releases or are you keeping up
>> with new releases instead? Are there advantages to running Cloudera's
>> packages instead of the Apache releases (besides that it is slightly easier
>> to install)?
>>
>>
> This is an ASF list so I won't address this here. I'll let other users
> answer this question if they so choose.
>
> Feel free to redirect this question to the cdh-user mailing list and some
> Cloudera employees can help you out.
>
> Thanks
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
View raw message