hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma" <jssa...@facebook.com>
Subject RE: sort speeds under java, c++, and streaming
Date Fri, 09 Nov 2007 01:35:53 GMT
Doesn't the sorting and merging all still happen in Java-land?

-----Original Message-----
From: Owen O'Malley [mailto:oom@yahoo-inc.com] 
Sent: Thursday, November 08, 2007 5:03 PM
To: hadoop-user@lucene.apache.org
Subject: sort speeds under java, c++, and streaming

I set up a little benchmark on a 39 node cluster to sort 40gb of  
random text data (generated by RandomTextWriter using key length:  
1-10 words and value length: 0-200 words, data uncompressed). The  
runtimes in minutes are:

Java:			4:22
C++ (Pipes):		3:50
Streaming:		4:44

I was surprised to find that Pipes out performed Java, even with the  
extra process. I suspect it was because of the buffering between the  
input and output of Pipes.

-- Owen

View raw message