hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Hadoop streaming performance problem
Date Tue, 01 Apr 2008 01:21:05 GMT

My experiences with Groovy are similar.  Noticeable slowdown, but quite
bearable (almost always better than 50% of best attainable speed).

The highest virtue is that simple programs become simple again.  Word count
is < 5 lines of code.


On 3/31/08 6:10 PM, "Colin Evans" <colin@metaweb.com> wrote:

> At Metaweb, we did a lot of comparisons between streaming (using Python)
> and native Java, and in general streaming performance was not much
> slower than the native java -- most of the slowdown was from Python
> being a slow language.
> The main problems with streaming apps that we found are that they are
> hard to write and there are many ways that you can make simple mistakes
> in streaming that slow down performance.
> We've been experimenting with embedding JavaScript (Rhino) and Jython
> for writing jobs, and have found that performance is good and the apps
> are much easier to write.  The tight Java integration means that
> performance bottlenecks get rewritten in Java with little sacrifice to
> development speed.  One of these days we'll open source these frameworks.
> Parand Darugar wrote:
>> Travis Brady wrote:
>>> This brings up two interesting issues:
>>> 1. Hadoop streaming is a potentially very powerful tool, especially for
>>> those of us who don't work in Java for whatever reason
>>> 2. If Hadoop streaming is "at best a jury rigged solution" then that
>>> should
>>> be made known somewhere on the wiki.  If it's really not supposed to be
>>> used, why is it provided at all?
>> A set of reasonable performance tests and results would be very
>> helpful in helping people decide whether to go with streaming or not.
>> Hopefully we can get some numbers from this thread and publish them?
>> Anyone else compared streaming with native java?
>> Best,
>> Parand

View raw message