hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick McCormack <pnmccorm...@yahoo.com>
Subject Re: Google Terasort Benchmark
Date Sat, 22 Nov 2008 14:07:05 GMT

I reckon it's all about spindles - I took a quick look at the pretty detailed hardware config
that Owen released with the Hadoop benchmark and it was run on nodes with 4 Sata drives -
the Google blog hints at 12 disks per node (the number of disks/nodes was only given for their
1Pb experiement). Google got 3 times performance increase with 3 times the number of disks.


Patrick.




________________________________
From: Tom White <tom.e.white@gmail.com>
To: core-user@hadoop.apache.org; core-dev@hadoop.apache.org
Sent: Saturday, November 22, 2008 1:26:25 AM
Subject: Google Terasort Benchmark

>From the Google Blog,
http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html

"We are excited to announce we were able to sort 1TB (stored on the
Google File System as 10 billion 100-byte records in uncompressed text
files) on 1,000 computers in 68 seconds. By comparison, the previous
1TB sorting record [using Hadoop] is 209 seconds on 910 computers."

Something for the Hadoop community to aim for: a threefold performance increase.

Tom



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message