hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert <robert-hamil...@austin.rr.com>
Subject Re: Hadoop pain points?
Date Sun, 04 Mar 2012 18:40:06 GMT
2012/3/2 Kunaal wrote:
> I am doing a general poll on what are the most prevalent pain points that
> people run into with Hadoop? These could be performance related (memory
> usage, IO latencies), usage related or anything really.
>

My biggest frustration with core Hadoop after the last year or so has
been not having the capability to efficiently implement the so-called
"analytic functions" in general with map reduce.

These are not what one would think they are from just the name by the
way - see Oracle Analytics as an example of what I mean. The big
advantage is that they often allow you to avoid expensive self-joins
which can make a huge difference performance wise.

(I would say that 80% of the analytic functions can be implemented with
a UDF or a UDA in hive -- things like lead() or lag() or first() or
rank() -- but it is the other 20% that would knock the ball out of the park)



Mime
View raw message