hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hadoop <hadoo...@gmail.com>
Subject Re: Choosing a Hadoop distribution
Date Fri, 21 Sep 2012 03:17:58 GMT
I Have the same question.   
Which version ,Which vender do we choose?


--  
hadoop
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On 2012年9月21日Friday at 上午2:22, Aaron Eng wrote:

> > I'm tasked with creating a guide that instructs on how to choose a Hadoop
>  
> distribution from the handful of common options.
> > Does anyone have any thoughts on what criteria might govern such a
>  
> decision?
>  
> What problem(s) are you trying to solve with Hadoop (and related projects)?
> What are your expectations of the technology?
>  
> The details beyond that level could take many, many pages to cover.
>  
> Not all Hadoop distributions are tested the same way, packaged with the
> same components, etc. Not all components of a given Hadoop distribution
> work with other Hadoop distributions. There are a lot of common things
> between distributions which is probably why its difficult to articulate how
> to choose one over the another. So when you look at the problem you are
> trying to solve and your expectations of the technology, many things may
> seem relatively equal and hence you may need to get into some significant
> level of detail to pick something that best solves your problem. In some
> cases it may be very straightforward as to whether a distribution will meet
> your requirements. In other cases, things may look relatively equal across
> the board until you drill down to a point where you find differentiation
> (or maybe you dont find it). But those would be my critera, articulate the
> problem and expectations and compare functionality until you find
> differentiation.
>  
>  
>  
> On Thu, Sep 20, 2012 at 11:06 AM, Keith Wiley <kwiley@keithwiley.com (mailto:kwiley@keithwiley.com)>
wrote:
>  
> > I'm tasked with creating a guide that instructs on how to choose a Hadoop
> > distribution from the handful of common options. I'm finding this rather
> > perplexing. While some of the venders offer additional management software
> > (Cloudera Manager is an example) I'm unclear whether those packages could
> > be installed and run irregardless of the underlying Hadoop distribution or
> > if they are exclusively compatible with their vender's distribution (or if
> > there's some crossover). I'm also unclear on any other basis for
> > comparison. For example HortonWorks originated HCatalog (to the best of my
> > understanding), but that doesn't necessarily mean one needs to use the HW
> > Hadoop dist. to use HCatalog since it's just a public Apache project anyway
> > at this point. I'm sure similar statements could be made about MapR or
> > Greenplum (although I thin Greenplum's Hadoop uses MapR's M5 anyway so
> > again, the decision-making process in such a case seems baffling).
> >  
> > And then there's the option of installing the Apache version directly,
> > always on the table I suppose.
> >  
> > Does anyone have any thoughts on what criteria might govern such a
> > decision? I'm not trying to get into an argument about which distribution
> > is best, I'm not even looking for defenses or arguments for one
> > distribution or another, but rather a notion of what the criteria for
> > basing such a decision might be.
> >  
> > Thanks.
> >  
> > Cheers!
> >  
> >  
> > ________________________________________________________________________________
> > Keith Wiley kwiley@keithwiley.com (mailto:kwiley@keithwiley.com) keithwiley.com
(http://keithwiley.com)
> > music.keithwiley.com (http://music.keithwiley.com)
> >  
> > "It's a fine line between meticulous and obsessive-compulsive and a
> > slippery
> > rope between obsessive-compulsive and debilitatingly slow."
> > -- Keith Wiley
> >  
> > ________________________________________________________________________________
 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message