hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball ...@cs.washington.edu>
Subject Re: sort speeds under java, c++, and streaming
Date Fri, 09 Nov 2007 01:11:08 GMT
Neat benchmark. I've been meaning to do exactly that myself. And that is 
a surprise about Pipes!

Thanks for the data
- Aaron

Owen O'Malley wrote:
> I set up a little benchmark on a 39 node cluster to sort 40gb of random 
> text data (generated by RandomTextWriter using key length: 1-10 words 
> and value length: 0-200 words, data uncompressed). The runtimes in 
> minutes are:
> 
> Java:            4:22
> C++ (Pipes):        3:50
> Streaming:        4:44
> 
> I was surprised to find that Pipes out performed Java, even with the 
> extra process. I suspect it was because of the buffering between the 
> input and output of Pipes.
> 
> -- Owen

Mime
View raw message