hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: HDFS read/write speeds, and read optimization
Date Fri, 10 Apr 2009 17:15:56 GMT

On Apr 10, 2009, at 9:40 AM, Stas Oskin wrote:

> Hi.
>
>
>> Depends.  What hardware?  How much hardware?  Is the cluster under  
>> load?
>> What does your I/O load look like?  As a rule of thumb, you'll  
>> probably
>> expect very close to hardware speed.
>>
>
> Standard Xeon dual cpu, quad core servers, 4 GB RAM.
> The DataNodes also do some processing, with usual loads about ~4  
> (from 8
> recommended). The IO load is linear, there are almost no write or read
> peaks.
>

Interesting -- machines are fairly RAM-poor for data processing ... I  
guess your tasks must be fairly efficient.

> By close to hardware speed, you mean results very near the results I  
> get via
> iozone?

Depends on what kind of I/O you do - are you going to be using  
MapReduce and co-locating jobs and data?  If so, it's possible to get  
close to those speeds if you are I/O bound in your job and read right  
through each chunk.  If you have multiple disks mounted individually,  
you'll need the number of streams equal to the number of disks.  If  
you're going to do I/O that's not through MapReduce, you'll probably  
be bound by the network interface.

Brian

Mime
View raw message