hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dhruba Borthakur <dhr...@gmail.com>
Subject Re: caching in hdfs?
Date Mon, 20 Jul 2009 05:50:11 GMT
I am assuming that you are talking about a map-reduce job. In this case, if
you run your job twice, each mapper will contact the namenode everytime the
mapper starts.

f you use FSDataInputStream to read a HDFS file, data is streamed from the
datanode(s) to the client. It is buffered as part of FSDataInputStream.
However, if you open the same file again and get another FSDataInputStream,
the buffer of the first stream is not shared with the buffer associated with
the second stream (although they refer to the same HDFS file)


On Sun, Jul 19, 2009 at 10:11 PM, Iman E <hadoop_ami@yahoo.com> wrote:

> Hi,
> I would like to know if hdfs do caching by default at slaves. If I ran my
> job twice and I am assuming that the data is split the same way each time,
> is the namenode contacted everytime to know the loaction of these files?
> Also, is the data read directly from disk everytime or it can be read from
> the cache? I am using FSDataInputStream   to open the files and read them.
> Thanks
> Iman

View raw message