hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Demai Ni <nid...@gmail.com>
Subject Re: read from a hdfs file on the same host as client
Date Tue, 14 Oct 2014 01:36:04 GMT
Shivram,

many thanks for confirming the behavior. I will also turn on the
shortcircuit as you suggested. Appreciate the help

Demai

On Mon, Oct 13, 2014 at 3:42 PM, Shivram Mani <smani@pivotal.io> wrote:

> Demai, you are right. HDFS's default BlockPlacementPolicyDefault makes
> sure one replica of your block is available on the writer's datanode.
> The replica selection for the read operation is also aimed at minimizing
> bandwidth/latency and will serve the block from the reader's local node.
> If you want to further optimize this, you can set 'dfs.client.read.shortcircuit'
> to true. This would allow the client to bypass the datanode to read the
> file directly.
>
> On Mon, Oct 13, 2014 at 11:58 AM, Demai Ni <nidmgg@gmail.com> wrote:
>
>> hi, folks,
>>
>> a very simple question, looking forward a couple pointers.
>>
>> Let's say I have a hdfs file: testfile, which only have one block(256MB),
>> and the block has a replica on datanode: host1.hdfs.com (the whole hdfs
>> may have 100 nodes though, and the other 2 replica are available at other
>> datanode).
>>
>> If on host1.hdfs.com, I did a "hadoop fs -cat testfile" or a java client
>> to read the file. Should I assume there won't be any significant data
>> movement through network?  That is the namenode is smart enough to give me
>> the data on host1.hdfs.com directly?
>>
>> thanks
>>
>> Demai
>>
>
>
>
> --
> Thanks
> Shivram
>

Mime
View raw message