hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Eng <a...@maprtech.com>
Subject Re: Choosing a Hadoop distribution
Date Thu, 20 Sep 2012 18:22:01 GMT
>I'm tasked with creating a guide that instructs on how to choose a Hadoop
distribution from the handful of common options.
>Does anyone have any thoughts on what criteria might govern such a
decision?

What problem(s) are you trying to solve with Hadoop (and related projects)?
What are your expectations of the technology?

The details beyond that level could take many, many pages to cover.

Not all Hadoop distributions are tested the same way, packaged with the
same components, etc.  Not all components of a given Hadoop distribution
work with other Hadoop distributions.  There are a lot of common things
between distributions which is probably why its difficult to articulate how
to choose one over the another.  So when you look at the problem you are
trying to solve and your expectations of the technology, many things may
seem relatively equal and hence you may need to get into some significant
level of detail to pick something that best solves your problem.  In some
cases it may be very straightforward as to whether a distribution will meet
your requirements.  In other cases, things may look relatively equal across
the board until you drill down to a point where you find differentiation
(or maybe you dont find it).  But those would be my critera, articulate the
problem and expectations and compare functionality until you find
differentiation.



On Thu, Sep 20, 2012 at 11:06 AM, Keith Wiley <kwiley@keithwiley.com> wrote:

> I'm tasked with creating a guide that instructs on how to choose a Hadoop
> distribution from the handful of common options.  I'm finding this rather
> perplexing.  While some of the venders offer additional management software
> (Cloudera Manager is an example) I'm unclear whether those packages could
> be installed and run irregardless of the underlying Hadoop distribution or
> if they are exclusively compatible with their vender's distribution (or if
> there's some crossover).  I'm also unclear on any other basis for
> comparison.  For example HortonWorks originated HCatalog (to the best of my
> understanding), but that doesn't necessarily mean one needs to use the HW
> Hadoop dist. to use HCatalog since it's just a public Apache project anyway
> at this point.  I'm sure similar statements could be made about MapR or
> Greenplum (although I thin Greenplum's Hadoop uses MapR's M5 anyway so
> again, the decision-making process in such a case seems baffling).
>
> And then there's the option of installing the Apache version directly,
> always on the table I suppose.
>
> Does anyone have any thoughts on what criteria might govern such a
> decision?  I'm not trying to get into an argument about which distribution
> is best, I'm not even looking for defenses or arguments for one
> distribution or another, but rather a notion of what the criteria for
> basing such a decision might be.
>
> Thanks.
>
> Cheers!
>
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com
> music.keithwiley.com
>
> "It's a fine line between meticulous and obsessive-compulsive and a
> slippery
> rope between obsessive-compulsive and debilitatingly slow."
>                                            --  Keith Wiley
>
> ________________________________________________________________________________
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message