Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
MIME-Version: 1.0
Date: Tue, 11 Aug 2015 10:53:32 -0700
Message-ID: 
 <CAOEq2C6bWYHSMdmFJGJM4zTqCt8KF9+zh=UcZMvWhJuz5EJjew@mail.gmail.com>
Subject: hadoop/hdfs cache question, do client processes share cache?
From: Demai Ni <nidmgg@gmail.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=047d7bb03c5ac8acf0051d0cc7d8

--047d7bb03c5ac8acf0051d0cc7d8
Content-Type: text/plain; charset=UTF-8

hi, folks,

I have a quick question about how hdfs handle cache? In this lab
experiment, I have a 4 node hadoop cluster (2.x) and each node has a fair
large memory (96GB).  And have a single hdfs file with 256MB, and also fit
in one HDFS block. The local filesystem is linux.

Now from one of the DataNode, I started 10 hadoop client processes to
repeatedly read the above file. With the assumption that HDFS will cache
the 256MB in memory, so (after the 1st read) READs will have no disk I/O
involved anymore.

My question is : *how many COPY of the 256MB will be in memory of this
DataNode? 10 or 1?*

How about the 10 client processes are located at the 5th linux box
 independent of the cluster? Will we have 10 copies of the 256 MB or just
1?

Many thanks. Appreciate your help on this.

Demai

--047d7bb03c5ac8acf0051d0cc7d8
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">hi, folks,<div><br></div><div>I have a quick question abou=
t how hdfs handle cache? In this lab experiment, I have a 4 node hadoop clu=
ster (2.x) and each node has a fair large memory (96GB).=C2=A0 And have a s=
ingle hdfs file with 256MB, and also fit in one HDFS block. The local files=
ystem is linux. =C2=A0</div><div><br></div><div>Now from one of the DataNod=
e, I started 10 hadoop client processes to repeatedly read the above file. =
With the assumption that HDFS will cache the 256MB in memory, so (after the=
 1st read) READs will have no disk I/O involved anymore. =C2=A0</div><div><=
br></div><div>My question is : <b><i>how many COPY of the 256MB will be in =
memory of this DataNode? 10 or 1?</i></b>=C2=A0</div><div><br></div><div>Ho=
w about the 10 client processes are located at the 5th linux box =C2=A0inde=
pendent of the cluster? Will we have 10 copies of the 256 MB or just 1? =C2=
=A0</div><div><br></div><div>Many thanks. Appreciate your help on this.=C2=
=A0</div><div><br></div><div>Demai</div></div>

--047d7bb03c5ac8acf0051d0cc7d8--