hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Loddengaard <a...@cloudera.com>
Subject Re: Few Queries..!!!
Date Fri, 05 Jun 2009 17:49:44 GMT

The throughput of HDFS is good, because each read is basically a stream from
several hard drives (each hard drive holds a different block of the file,
and these blocks are distributed across many machines).  That said, HDFS
does not have very good latency, at least compared to local file systems.

When you write a file using the HDFS client (whether it be Java or
bin/hadoop fs), the client and the name node coordinate to put your file on
various nodes in the cluster.  When you use that same client to read data,
your client coordinates with the name node to get block locations for a
given file and does a HTTP GET request to fetch those blocks from the nodes
which store them.

You could in theory get data off of the local file system on your data
nodes, but this wouldn't make any sense, because the client does everything
for you already.

Hope this clears things up.


On Fri, Jun 5, 2009 at 12:53 AM, Sugandha Naolekar

> Hello!
> Placing any kind of data into HDFS and then getting it back, can this
> activity be fast? Also, the node of which I have to place the data in HDFS,
> is a remote node. So then, will I have to use RPC mechnaism or simply cna
> get the locla filesystem of that node and do the things?
> --
> Regards!
> Sugandha

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message