hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From daemeon reiydelle <daeme...@gmail.com>
Subject Re: Why do reads take as long as replicated writes?
Date Wed, 05 Nov 2014 01:00:52 GMT
Reads can be faster than writes for smaller bursts of IO in part due to
disk and memory caching of reads (if you turn on write back (not
recommended!) your numbers above are likely to get closer together). As
your volume of IO increases, you tend to reach a point where you are bound
(more or less) by physical IO and are not leveraging the cache optimization
any more. FYI, if what you are seeing are these cache misses, then reducing
the percent of memory used by UNIX for file system buffers should result in
your observed phenomenon occurring sooner.

If you look at iostats you may see that the read and write service times on
the devices are converging, which is another indication of cache misses.









*.......“The race is not to the swift,nor the battle to the strong,but to
those who can see it coming and jump aside.” - Hunter ThompsonDaemeon C.M.
ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, Nov 4, 2014 at 3:42 PM, Andrew Wang <andrew.wang@cloudera.com>
wrote:

> I would advise against using TestDFSIO, instead trying TeraGen and
> TeraValidate. IIRC TestDFSIO doesn't actually schedule for task locality,
> so it's not very good if you have a cluster bigger than your replication
> factor. You might be network bound as you try to read more files.
>
> Best,
> Andrew
>
> On Tue, Nov 4, 2014 at 6:19 AM, Eitan Rosenfeld <eitan27@gmail.com> wrote:
>
> > I am benchmarking my cluster of 16 nodes (all in one rack) with TestDFSIO
> > on
> > Hadoop 1.0.4.  For simplicity, I turned off speculative task execution
> and
> > set
> > the max map and reduce tasks to 1.
> >
> > With a replication factor of 2, writing 1 file of 5GB takes twice as long
> > as
> > reading 1 file. This result seems to make sense since the replication
> > results
> > in twice the I/O in the cluster versus the read. However, as I scale up
> the
> > number of 5GB files from 1 to 64 files, reading ultimately takes as long
> as
> > writing. In particular, I see this result when writing and reading 64
> > such files.
> >
> > What could cause read performance to degrade faster than write
> performance
> > as the number of files increases?
> >
> > The full results (number of 5GB files, ratio of write time to read
> > time) are below:
> > 1,  2.02
> > 2,  1.87
> > 4,  1.73
> > 8,  1.54
> > 16,  1.37
> > 32,  1.29
> > 64,  1.01
> >
> > Thank you,
> >
> > Eitan
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message