hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Dyer <redp...@gmail.com>
Subject Re: Why hadoop is written in java?
Date Tue, 12 Oct 2010 04:20:00 GMT
The Java memory overhead is a quite serious problem, and a legitimate
and serious criticism of Hadoop. For MapReduce applications, it is
often (although not always) possible to improve performance by doing
more work in memory (e.g., using combiners and the like) before
emitting data. Thus, the more memory available to your application,
the more efficient it runs. Therefore, if you have a framework that
locks up 500mb rather than 50mb, you systematically get less
performance out of your cluster.

The second issue is that C/C++ bindings are common and widely used
from many languages, but it is not generally possible to interface
directly with Java (or Java libraries) from another language, unless
that language is also built on top of the JVM. This is a very
unfortunate because many problems that would be quite naturally
expressed in MapReduce are better solved in non-JVM languages.

But, Java is what we have, and it works well enough for many things.

On Mon, Oct 11, 2010 at 11:18 PM, Dhruba Borthakur <dhruba@gmail.com> wrote:
> I agree with others in this list that Java provides faster software
> development, the IO cost in Java is practically the same as in C/C++, etc.
> In short, most pieces of distributed software can be written in Java without
> any performance hiccups, as long as it is only system metadata that is
> handled by Java.
>
> One problem is when data-flow has to occur in Java. Each record that is read
> from the storage has to be de-serialized, uncompressed and then processed.
> This processing could be very slow in Java compared to when written in other
> languages, especially because of the creation/destruction of too many
> objects.  It would have been nice if the map/reduce task could have been
> written in C/C++, or better still, if the sorting inside the MR framework
> could occur in C/C++.
>
> thanks,
> dhruba
>
> On Mon, Oct 11, 2010 at 4:50 PM, helwr <helwyr@gmail.com> wrote:
>
>>
>> Check out this thread:
>> https://www.quora.com/Why-was-Hadoop-written-in-Java
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p1684291.html
>> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>

Mime
View raw message