hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wiley <kwi...@keithwiley.com>
Subject Re: Choosing a Hadoop distribution
Date Fri, 21 Sep 2012 03:45:32 GMT
Thanks, that all seems quite reasonable I suppose.

Cheers!

On Sep 20, 2012, at 11:22 , Aaron Eng wrote:

>> I'm tasked with creating a guide that instructs on how to choose a Hadoop
> distribution from the handful of common options.
>> Does anyone have any thoughts on what criteria might govern such a
> decision?
> 
> What problem(s) are you trying to solve with Hadoop (and related projects)?
> What are your expectations of the technology?
> 
> The details beyond that level could take many, many pages to cover.
> 
> Not all Hadoop distributions are tested the same way, packaged with the
> same components, etc.  Not all components of a given Hadoop distribution
> work with other Hadoop distributions.  There are a lot of common things
> between distributions which is probably why its difficult to articulate how
> to choose one over the another.  So when you look at the problem you are
> trying to solve and your expectations of the technology, many things may
> seem relatively equal and hence you may need to get into some significant
> level of detail to pick something that best solves your problem.  In some
> cases it may be very straightforward as to whether a distribution will meet
> your requirements.  In other cases, things may look relatively equal across
> the board until you drill down to a point where you find differentiation
> (or maybe you dont find it).  But those would be my critera, articulate the
> problem and expectations and compare functionality until you find
> differentiation.
> 
> 
> On Thu, Sep 20, 2012 at 11:06 AM, Keith Wiley <kwiley@keithwiley.com> wrote:
> 
>> I'm tasked with creating a guide that instructs on how to choose a Hadoop
>> distribution from the handful of common options.  I'm finding this rather
>> perplexing.  While some of the venders offer additional management software
>> (Cloudera Manager is an example) I'm unclear whether those packages could
>> be installed and run irregardless of the underlying Hadoop distribution or
>> if they are exclusively compatible with their vender's distribution (or if
>> there's some crossover).  I'm also unclear on any other basis for
>> comparison.  For example HortonWorks originated HCatalog (to the best of my
>> understanding), but that doesn't necessarily mean one needs to use the HW
>> Hadoop dist. to use HCatalog since it's just a public Apache project anyway
>> at this point.  I'm sure similar statements could be made about MapR or
>> Greenplum (although I thin Greenplum's Hadoop uses MapR's M5 anyway so
>> again, the decision-making process in such a case seems baffling).
>> 
>> And then there's the option of installing the Apache version directly,
>> always on the table I suppose.
>> 
>> Does anyone have any thoughts on what criteria might govern such a
>> decision?  I'm not trying to get into an argument about which distribution
>> is best, I'm not even looking for defenses or arguments for one
>> distribution or another, but rather a notion of what the criteria for
>> basing such a decision might be.
>> 
>> Thanks.
>> 
>> Cheers!


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"And what if we picked the wrong religion?  Every week, we're just making God
madder and madder!"
                                           --  Homer Simpson
________________________________________________________________________________


Mime
View raw message