hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Map performance with custom binary format
Date Tue, 28 Jul 2009 20:25:49 GMT
On Tue, Jul 28, 2009 at 12:15 PM, william kinney

> Also, from the job page (different job, same Map method, just more
> data...~40GB. 781 files):
> Map input records       629,738,080
> Map input bytes         41,538,992,880
> Anything else I can look into?

Yes.  The number of data local maps and how many maps total.

> Do my original numbers (only 2x performance) jump out at you as being
> way off? Or it is common to see that a setup similar to mine?

It is way off.  My experience is that from 5 EC2 nodes, I can sustain
100-200MB / s to the *network*.  These are lesser machines than you have and
you have twice as many.  Moreover, your test program is nicely designed to
avoid all of the overhead attendant on running a full program.  It is
reasonable to expect significant slow down due to startup and due to going
through HDFS, but for local blocks I would expect good performance.

Is it possible that the 50MB/s on a single node was not a real number?  It
seems somewhat high but probably reasonable with modern hardware.  Was the
file already in memory?

Ted Dunning, CTO

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message