hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcos Ortiz <mlor...@uci.cu>
Subject Re: WG: Choosing a Hadoop distribution
Date Mon, 24 Sep 2012 13:41:53 GMT

On 09/24/2012 06:29 AM, Christian Schäfer wrote:
> I think a good starting point for that distribution guide would be a feature matrix where
all reasonable distributions could be compaired.
+1  for this idea
I think that this feature matrix will be on the Hadoop wiki.

>
>
> There could be metrics for cross cutting concerns like performance, security, etc. referring
to real benchmarks.
> Upon this one could derive (maybe by additional explainations) which distribution fits
in a certain use case the best.
Umm, this is tricky, How we can decide which is the best fit for a 
certain type of problem?
My suggestion is to avoid this, because this will bring some hot 
discussions and that´s not the idea.
It´s my personal opinion.
>
> Though, most important is that this comparison is not biased but indepedent.
>
> regards
> Chris
>
>
> ________________________________
> Von: Keith Wiley <kwiley@keithwiley.com>
> An: general@hadoop.apache.org
> Gesendet: 5:45 Freitag, 21.September 2012
> Betreff: Re: Choosing a Hadoop distribution
>
> Thanks, that all seems quite reasonable I suppose.
>
> Cheers!
>
> On Sep 20, 2012, at 11:22 , Aaron Eng wrote:
>
>>> I'm tasked with creating a guide that instructs on how to choose a Hadoop
>> distribution from the handful of common options.
>>> Does anyone have any thoughts on what criteria might govern such a
>> decision?
>>
>> What problem(s) are you trying to solve with Hadoop (and related projects)?
>> What are your expectations of the technology?
>>
>> The details beyond that level could take many, many pages to cover.
>>
>> Not all Hadoop distributions are tested the same way, packaged with the
>> same components,
> etc.  Not all components of a given Hadoop distribution
>> work with other Hadoop distributions.  There are a lot of common things
>> between distributions which is probably why its difficult to articulate how
>> to choose one over the another.  So when you look at the problem you are
>> trying to solve and your expectations of the technology, many things may
>> seem relatively equal and hence you may need to get into some significant
>> level of detail to pick something that best solves your problem.  In some
>> cases it may be very straightforward as to whether a distribution will meet
>> your requirements.  In other cases, things may look relatively equal across
>> the board until you drill down to a point where you find differentiation
>> (or maybe you dont find it).  But those would be my critera, articulate the
>> problem and expectations and compare functionality
> until you find
>> differentiation.
>>
>>
>> On Thu, Sep 20, 2012 at 11:06 AM, Keith Wiley <kwiley@keithwiley.com> wrote:
>>
>>> I'm tasked with creating a guide that instructs on how to choose a Hadoop
>>> distribution from the handful of common options.  I'm finding this rather
>>> perplexing.  While some of the venders offer additional management software
>>> (Cloudera Manager is an example) I'm unclear whether those packages could
>>> be installed and run irregardless of the underlying Hadoop distribution or
>>> if they are exclusively compatible with their vender's distribution (or if
>>> there's some crossover).  I'm also unclear on any other basis for
>>> comparison.  For example HortonWorks originated HCatalog (to the best of my
>>>
> understanding), but that doesn't necessarily mean one needs to use the HW
>>> Hadoop dist. to use HCatalog since it's just a public Apache project anyway
>>> at this point.  I'm sure similar statements could be made about MapR or
>>> Greenplum (although I thin Greenplum's Hadoop uses MapR's M5 anyway so
>>> again, the decision-making process in such a case seems baffling).
>>>
>>> And then there's the option of installing the Apache version directly,
>>> always on the table I suppose.
>>>
>>> Does anyone have any thoughts on what criteria might govern such a
>>> decision?  I'm not trying to get into an argument about which distribution
>>> is best, I'm not even looking for defenses or arguments for one
>>> distribution or another, but rather a notion of what the criteria for
>>> basing such a decision might be.
>>>
>>>
> Thanks.
>>> Cheers!
>
> ________________________________________________________________________________
> Keith Wiley    kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com
>
> "And what if we picked the wrong religion?  Every week, we're just making God
> madder and madder!"
>                                             --  Homer Simpson
> ________________________________________________________________________________
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

-- 

Marcos Luis Ortíz Valmaseda
*Data Engineer && Sr. System Administrator at UCI*
about.me/marcosortiz <http://about.me/marcosortiz>
My Blog <http://marcosluis2186.posterous.com>
Tumblr's blog <http://marcosortiz.tumblr.com/>
@marcosluis2186 <http://twitter.com/marcosluis2186>



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message