hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Boudnik <...@apache.org>
Subject Re: Choosing a Hadoop distribution
Date Fri, 21 Sep 2012 05:44:47 GMT
I would add a couple more points to your consideration (may be this is just
me):
  - vendor lock-in:
    
    - when you pick a software make sure that you'd be able to move over to a
      different (yet similar) product offering if you need to.  You are asking
      about CHD's CM here: I don't think it would work with anything else but
      CDH (I am not working there, so I don't know for sure - but it seems
      line a reasonable assumption).
    
    - HW's HDP is providing Ambari for the cluster management needs, that is a
      completely open source technology that you can master if needed and most
      likely use with other stack based on Hadoop (as far as I can see).

    - MapR has quite a bit of proprietary components in their stack, which
      might be beneficial in your particular case or not: this is something
      you have to decide for yourself.

  - what are the road-map of possible distributions? Do they have what you
    need in the future? The case in the point is these guys
        http://www.magnatempusgroup.net/blog/2012/09/05/whats-cooking/
    who are seemingly bringing in in-memory analytics in their upcoming
    release. You might want to follow a big Hadoop conference next month,
    that's likely to have a number of interesting announcements (otherwise,
    what would be the point of such conference ;)

These two would be a pivotal points for me. Hope it helps,
  Cos

On Fri, Sep 21, 2012 at 11:17AM, hadoop wrote:
> I Have the same question.   
> Which version ,Which vender do we choose?
> 
> 
> --  
> hadoop
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> 
> 
> On 2012年9月21日Friday at 上午2:22, Aaron Eng wrote:
> 
> > > I'm tasked with creating a guide that instructs on how to choose a Hadoop
> >  
> > distribution from the handful of common options.
> > > Does anyone have any thoughts on what criteria might govern such a
> >  
> > decision?
> >  
> > What problem(s) are you trying to solve with Hadoop (and related projects)?
> > What are your expectations of the technology?
> >  
> > The details beyond that level could take many, many pages to cover.
> >  
> > Not all Hadoop distributions are tested the same way, packaged with the
> > same components, etc. Not all components of a given Hadoop distribution
> > work with other Hadoop distributions. There are a lot of common things
> > between distributions which is probably why its difficult to articulate how
> > to choose one over the another. So when you look at the problem you are
> > trying to solve and your expectations of the technology, many things may
> > seem relatively equal and hence you may need to get into some significant
> > level of detail to pick something that best solves your problem. In some
> > cases it may be very straightforward as to whether a distribution will meet
> > your requirements. In other cases, things may look relatively equal across
> > the board until you drill down to a point where you find differentiation
> > (or maybe you dont find it). But those would be my critera, articulate the
> > problem and expectations and compare functionality until you find
> > differentiation.
> >  
> >  
> >  
> > On Thu, Sep 20, 2012 at 11:06 AM, Keith Wiley <kwiley@keithwiley.com (mailto:kwiley@keithwiley.com)>
wrote:
> >  
> > > I'm tasked with creating a guide that instructs on how to choose a Hadoop
> > > distribution from the handful of common options. I'm finding this rather
> > > perplexing. While some of the venders offer additional management software
> > > (Cloudera Manager is an example) I'm unclear whether those packages could
> > > be installed and run irregardless of the underlying Hadoop distribution or
> > > if they are exclusively compatible with their vender's distribution (or if
> > > there's some crossover). I'm also unclear on any other basis for
> > > comparison. For example HortonWorks originated HCatalog (to the best of my
> > > understanding), but that doesn't necessarily mean one needs to use the HW
> > > Hadoop dist. to use HCatalog since it's just a public Apache project anyway
> > > at this point. I'm sure similar statements could be made about MapR or
> > > Greenplum (although I thin Greenplum's Hadoop uses MapR's M5 anyway so
> > > again, the decision-making process in such a case seems baffling).
> > >  
> > > And then there's the option of installing the Apache version directly,
> > > always on the table I suppose.
> > >  
> > > Does anyone have any thoughts on what criteria might govern such a
> > > decision? I'm not trying to get into an argument about which distribution
> > > is best, I'm not even looking for defenses or arguments for one
> > > distribution or another, but rather a notion of what the criteria for
> > > basing such a decision might be.
> > >  
> > > Thanks.
> > >  
> > > Cheers!
> > >  
> > >  
> > > ________________________________________________________________________________
> > > Keith Wiley kwiley@keithwiley.com (mailto:kwiley@keithwiley.com) keithwiley.com
(http://keithwiley.com)
> > > music.keithwiley.com (http://music.keithwiley.com)
> > >  
> > > "It's a fine line between meticulous and obsessive-compulsive and a
> > > slippery
> > > rope between obsessive-compulsive and debilitatingly slow."
> > > -- Keith Wiley
> > >  
> > > ________________________________________________________________________________
 
> 

Mime
View raw message