hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Loddengaard <a...@cloudera.com>
Subject Re: HDFS read/write speeds, and read optimization
Date Fri, 10 Apr 2009 04:07:20 GMT
Answers in-line.

Alex

On Thu, Apr 9, 2009 at 3:45 PM, Stas Oskin <stas.oskin@gmail.com> wrote:

> Hi.
>
> I have 2 questions about HDFS performance:
>
> 1) How fast are the read and write operations over network, in Mbps per
> second?

Hypertable (a BigTable implementation) has a good KFS vs. HDFS breakdown: <
http://code.google.com/p/hypertable/wiki/KFSvsHDFS>

>
>
> 2) If the chunk server is located on same host as the client, is there any
> optimization in read operations?
> For example, Kosmos FS describe the following functionality:
>
> "Localhost optimization: One copy of data
> is placed on the chunkserver on the same
> host as the client doing the write
>
> Helps reduce network traffic"

In Hadoop-speak, we're interested in DataNodes (storage nodes) and
TaskTrackers (compute nodes).  In terms of MapReduce, Hadoop does try and
schedule tasks such that the data being processed by a given task on a given
machine is also on that machine.  As for loading data onto a DataNode,
loading data from a DataNode will put a replica on that node.  However, if
you're loading data from, say, your local machine, Hadoop will choose a
DataNode at random.

>
>
> Regards.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message