hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taeho Kang" <tka...@gmail.com>
Subject Re: MapReduce in C++ vs MapReduce in Java
Date Fri, 14 Sep 2007 01:13:20 GMT
Thanks for your answers and clarifications.
I will try to do some more benchmark testing with more nodes and keep you
guys posted.



On 9/14/07, Owen O'Malley <oom@yahoo-inc.com> wrote:
>
>
> On Sep 13, 2007, at 2:20 AM, Taeho Kang wrote:
>
> > I did run WordCount included in 0.14.1 release version on a 1 node
> > Hadoop
> > cluster (Pentium D with 2GB of RAM).
>
> Thanks for running the benchmark. I'm afraid that with such a small
> cluster and data size you are getting swamped in the start up costs.
> I have not done enough benchmarking of the C++ bindings
>
> > There were 2 input files (one 4.5MB file + one 36MB file).
> > I also did take Combiner out of Java version WordCount MapReduce,
> > as there
> > was no Combiner used for C++ version.
>
> Actually, the wordcount-part.cc example does have a combiner. You
> would want to remove the partitioner from that example that forces
> every key to partition 0 however. *smile* Actually, as an example,
> the bad partitioner wasn't a good idea. I should move the bad
> partitioner to a test case.
>
> > The result is.... as many of you have guessed, Java version won the
> > race big
> > time. Java version was about 4 times quicker.
>
> I'll write a sort benchmark for C++ so that we can run a reasonably
> large program. Note that for simple programs, the C++ is by
> definition slower since pipes runs the C++ as a subprocess underneath
> a Java mapper and reducer.
>
> -- Owen
>



-- 
Taeho Kang [tkang.blogspot.com]
Software Engineer, NHN Corporation, Korea

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message