Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0C7F618C5C for ; Tue, 11 Aug 2015 18:01:40 +0000 (UTC) Received: (qmail 5466 invoked by uid 500); 11 Aug 2015 17:53:36 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 5362 invoked by uid 500); 11 Aug 2015 17:53:36 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 5351 invoked by uid 99); 11 Aug 2015 17:53:35 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Aug 2015 17:53:35 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 58E49C0DF9 for ; Tue, 11 Aug 2015 17:53:35 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id in_a8g1V-_xD for ; Tue, 11 Aug 2015 17:53:33 +0000 (UTC) Received: from mail-ig0-f178.google.com (mail-ig0-f178.google.com [209.85.213.178]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 2540320B86 for ; Tue, 11 Aug 2015 17:53:33 +0000 (UTC) Received: by igbpg9 with SMTP id pg9so96162496igb.0 for ; Tue, 11 Aug 2015 10:53:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=08/eoQw57ZL72FgvqcXTkQgAlCUbeyGikjffkYL2DD4=; b=ViZpsu2TarPHbuX7by7DVZz0VhQUtk+8UT1Vjnm0nhCL6VcpiSBYyeg7u1/HrxUa9d bOLu+rToZ/CWzom4fT1cAMYht24ohm+ct2Nv/z0S5Oxy449FJFz8t3WT0Hc5SJso/Mak 51mfU1SpUrowj4WrOsYlOoEYr6yVxYm2D0fO3Mr62ZD8ycsB266GiS3gETqPsxbLgsDu X5wRXvLAPfF29fdpUezbH7yT3UfnNAUAnyMoHeH6han1EejFSz8BDsEAEt9D4imwaIY/ 7d4w0Dm6oQfz3IHfhLctPjEvzoOTGF+l7B3mnzi/f1y0AeP439yYIptR8YqnqryURjiu ixKA== MIME-Version: 1.0 X-Received: by 10.50.30.65 with SMTP id q1mr18980727igh.28.1439315612181; Tue, 11 Aug 2015 10:53:32 -0700 (PDT) Received: by 10.64.240.38 with HTTP; Tue, 11 Aug 2015 10:53:32 -0700 (PDT) Date: Tue, 11 Aug 2015 10:53:32 -0700 Message-ID: Subject: hadoop/hdfs cache question, do client processes share cache? From: Demai Ni To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=047d7bb03c5ac8acf0051d0cc7d8 --047d7bb03c5ac8acf0051d0cc7d8 Content-Type: text/plain; charset=UTF-8 hi, folks, I have a quick question about how hdfs handle cache? In this lab experiment, I have a 4 node hadoop cluster (2.x) and each node has a fair large memory (96GB). And have a single hdfs file with 256MB, and also fit in one HDFS block. The local filesystem is linux. Now from one of the DataNode, I started 10 hadoop client processes to repeatedly read the above file. With the assumption that HDFS will cache the 256MB in memory, so (after the 1st read) READs will have no disk I/O involved anymore. My question is : *how many COPY of the 256MB will be in memory of this DataNode? 10 or 1?* How about the 10 client processes are located at the 5th linux box independent of the cluster? Will we have 10 copies of the 256 MB or just 1? Many thanks. Appreciate your help on this. Demai --047d7bb03c5ac8acf0051d0cc7d8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
hi, folks,

I have a quick question abou= t how hdfs handle cache? In this lab experiment, I have a 4 node hadoop clu= ster (2.x) and each node has a fair large memory (96GB).=C2=A0 And have a s= ingle hdfs file with 256MB, and also fit in one HDFS block. The local files= ystem is linux. =C2=A0

Now from one of the DataNod= e, I started 10 hadoop client processes to repeatedly read the above file. = With the assumption that HDFS will cache the 256MB in memory, so (after the= 1st read) READs will have no disk I/O involved anymore. =C2=A0
<= br>
My question is : how many COPY of the 256MB will be in = memory of this DataNode? 10 or 1?=C2=A0

Ho= w about the 10 client processes are located at the 5th linux box =C2=A0inde= pendent of the cluster? Will we have 10 copies of the 256 MB or just 1? =C2= =A0

Many thanks. Appreciate your help on this.=C2= =A0

Demai
--047d7bb03c5ac8acf0051d0cc7d8--