hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Torsten Curdt <tcu...@apache.org>
Subject Re: Overhead of Java?
Date Thu, 06 Sep 2007 08:37:31 GMT

On 06.09.2007, at 09:56, Pietu Pohjalainen wrote:

> Jeroen Verhagen wrote:
>> On 9/5/07, Steve Schlosser <swschlosser@gmail.com> wrote:
>>
>>> question, but I was wondering if anyone has a reasonable qualitative
>>> answer that I can pass on when people ask.
>>>
>> Is this question really relevant since Hadoop is designed to run on a
>> cluster of commodity hardware Google-style? If there were any
>> difference I'm sure it would be solved by adding 1 machine to the
>> cluster.
>>
>
>
> Isn't it about whether to add 30% or 50% more machines? Which is  
> starting to get significant when you think whether to have 1000 or  
> 1500 machines.

A plain java vs <some language> discussion is way to simple. I've  
been working on a java project that way (!!) out-performed a similar C 
++ project. The design and a smart implementation will make more  
difference that just the plain language. Long running vs short  
running ..all what has already been said. At least that's my  
experience. That being said, for hadoop the one-child-jvm-per-job is  
what has quite a bit of an overhead. If you are not scared that your  
jobs will tear down your tasktrackers - we have an in-jvm execution  
patch. (not submitted yet though)

cheers
--
Torsten

Mime
View raw message