hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stas Oskin <stas.os...@gmail.com>
Subject Re: HDFS read/write speeds, and read optimization
Date Fri, 10 Apr 2009 18:51:48 GMT

> Depends on what kind of I/O you do - are you going to be using MapReduce
> and co-locating jobs and data?  If so, it's possible to get close to those
> speeds if you are I/O bound in your job and read right through each chunk.
>  If you have multiple disks mounted individually, you'll need the number of
> streams equal to the number of disks.  If you're going to do I/O that's not
> through MapReduce, you'll probably be bound by the network interface.

Btw, this what I wanted to ask as well:

Is it more efficient to unify the disks into one volume (RAID or LVM), and
then present them as a single space? Or it's better to specify each disk

Reliability-wise, the latter sounds more correct, as a single/several (up to
3) disks going down won't take the whole node with them. But perhaps there
is a performance penalty?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message