hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <steve.lough...@gmail.com>
Subject Re: WG: Choosing a Hadoop distribution
Date Mon, 24 Sep 2012 19:31:39 GMT
On 24 September 2012 14:41, Marcos Ortiz <mlortiz@uci.cu> wrote:

>
> On 09/24/2012 06:29 AM, Christian Schäfer wrote:
>
>> I think a good starting point for that distribution guide would be a
>> feature matrix where all reasonable distributions could be compaired.
>>
> +1  for this idea
> I think that this feature matrix will be on the Hadoop wiki.
>
>
gets too controversial

I wouldn't be completely dismissive of Apache 1.0.3; it went through the
large cluster QA by the QA team at hortonworks (disclaimer: my colleagues)
; the 1.x branch is going to be long-lived and is in use in production.


>
>>
>> There could be metrics for cross cutting concerns like performance,
>> security, etc. referring to real benchmarks.
>> Upon this one could derive (maybe by additional explainations) which
>> distribution fits in a certain use case the best.
>>
> Umm, this is tricky, How we can decide which is the best fit for a certain
> type of problem?
> My suggestion is to avoid this, because this will bring some hot
> discussions and that´s not the idea.
> It´s my personal opinion.
>

What would be good would be more traces of real-world cluster use, stuff
that can be fed into the gridmix 3 benchmarker [
http://developer.yahoo.com/blogs/hadoop/posts/2010/04/gridmix3_emulating_production/].
If your workload gets pulled into the performance tests used by the
Hadoop development teams. .

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message