hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wiley <kwi...@keithwiley.com>
Subject Choosing a Hadoop distribution
Date Thu, 20 Sep 2012 18:06:59 GMT
I'm tasked with creating a guide that instructs on how to choose a Hadoop distribution from
the handful of common options.  I'm finding this rather perplexing.  While some of the venders
offer additional management software (Cloudera Manager is an example) I'm unclear whether
those packages could be installed and run irregardless of the underlying Hadoop distribution
or if they are exclusively compatible with their vender's distribution (or if there's some
crossover).  I'm also unclear on any other basis for comparison.  For example HortonWorks
originated HCatalog (to the best of my understanding), but that doesn't necessarily mean one
needs to use the HW Hadoop dist. to use HCatalog since it's just a public Apache project anyway
at this point.  I'm sure similar statements could be made about MapR or Greenplum (although
I thin Greenplum's Hadoop uses MapR's M5 anyway so again, the decision-making process in such
a case seems baffling).

And then there's the option of installing the Apache version directly, always on the table
I suppose.

Does anyone have any thoughts on what criteria might govern such a decision?  I'm not trying
to get into an argument about which distribution is best, I'm not even looking for defenses
or arguments for one distribution or another, but rather a notion of what the criteria for
basing such a decision might be.



Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
                                           --  Keith Wiley

View raw message