hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shivram Mani <sm...@pivotal.io>
Subject Re: read from a hdfs file on the same host as client
Date Mon, 13 Oct 2014 22:42:23 GMT
Demai, you are right. HDFS's default BlockPlacementPolicyDefault makes sure
one replica of your block is available on the writer's datanode.
The replica selection for the read operation is also aimed at minimizing
bandwidth/latency and will serve the block from the reader's local node.
If you want to further optimize this, you can set
'dfs.client.read.shortcircuit'
to true. This would allow the client to bypass the datanode to read the
file directly.

On Mon, Oct 13, 2014 at 11:58 AM, Demai Ni <nidmgg@gmail.com> wrote:

> hi, folks,
>
> a very simple question, looking forward a couple pointers.
>
> Let's say I have a hdfs file: testfile, which only have one block(256MB),
> and the block has a replica on datanode: host1.hdfs.com (the whole hdfs
> may have 100 nodes though, and the other 2 replica are available at other
> datanode).
>
> If on host1.hdfs.com, I did a "hadoop fs -cat testfile" or a java client
> to read the file. Should I assume there won't be any significant data
> movement through network?  That is the namenode is smart enough to give me
> the data on host1.hdfs.com directly?
>
> thanks
>
> Demai
>



-- 
Thanks
Shivram

Mime
View raw message