Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D5D951025C for ; Sat, 21 Mar 2015 08:57:11 +0000 (UTC) Received: (qmail 75062 invoked by uid 500); 21 Mar 2015 08:57:10 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 74959 invoked by uid 500); 21 Mar 2015 08:57:10 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 74948 invoked by uid 99); 21 Mar 2015 08:57:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Mar 2015 08:57:10 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ronymin@dgist.ac.kr designates 114.71.99.27 as permitted sender) Received: from [114.71.99.27] (HELO spam1.dgist.ac.kr) (114.71.99.27) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Mar 2015 08:56:42 +0000 Received: from [114.71.99.6] ([114.71.99.6]) by spam1.dgist.ac.kr ([114.71.99.27]) with ESMTP id 1426927877.40863.2983660448.spam1 for ; Sat, 21 Mar 2015 17:51:17 +0900 (KST) Received: from [114.71.99.22] ([114.71.99.22]) by webmail3.dgist.ac.kr ([114.71.99.24]) with ESMTP id 1426927997.310906.47129647159040.webmail3 for ; Sat, 21 Mar 2015 17:53:17 +0900 (KST) X-Priority: 3 Message-ID: <7993692.1426927997462.JavaMail.root@webmail1> Date: Sat, 21 Mar 2015 17:53:17 +0900 (KST) From: =?UTF-8?B?64Ko7Jyk66+8?= Reply-To: =?UTF-8?B?64Ko7Jyk66+8?= To: hdfs-dev@hadoop.apache.org Subject: Does "copyToLocal" not consider the block locality? X-TERRACE-DUMMYSUBJECT: Terrace Spam system * X-TERRACE-DUMMYSUBJECT: Terrace Spam system * Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_30981_1205995330.1426927997463" X-TERRACE-SPAMMARK: NO (SR:13.67) (by Terrace) X-TERRACE-SPAMMARK: NO (SR:13.67) (by Terrace) X-TERRACE-SUPERWHITEIP: daou_ip_allow X-TERRACE-CLASSID: Terrace Spam system X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_30981_1205995330.1426927997463 Content-Type: multipart/alternative; boundary="----=_Part_30982_1212850924.1426927997463" ------=_Part_30982_1212850924.1426927997463 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Hello everyone. I have experienced a very strange situation about HDFS operation. I have a 1 master and 10 slaves cluster environment. When I put a file A into HDFS with dfs.replication=10, I can see every block of the file A is replicated in every node. So, it is reasonable to think that HDFS file reader can operate as local block reader when I want to read that file A. However, when I execute hdfs dfs –copyToLocal A /to/my/localDir, the file reading time is same as the case of dfs.replication=1. So, I moniter the network resources especially read and write data. Both two cases that dfs.replication={1, 10} fully exploit network resources.. This means reading that file does not consider the block location.. Is it reasonable operation of HDFS? Then, what is the true meaning of data locality in HDFS? (We all know about the data locality of map task..) I want to know the reason of the same performance between both two “copyToLocal” cases. Thanks!Yoonmin // Yoonmin Nam ------=_Part_30982_1212850924.1426927997463 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

 

Hello everyone.

I have experienced a very strange situation about HDFS operation.

 

I have a 1 master and 10 slaves cluster environment.

 

When I put a file A into HDFS with dfs.replication=10, I can see every block of the file A is replicated in every node.

So, it is reasonable to think that HDFS file reader can operate as local block reader when I want to read that file A.

 

However, when I execute hdfs dfs –copyToLocal A /to/my/localDir, the file reading time is same as the case of dfs.replication=1.

 

So, I moniter the network resources especially read and write data.

Both two cases that dfs.replication={1, 10} fully exploit network resources..

This means reading that file does not consider the block location..

 

Is it reasonable operation of HDFS?

Then, what is the true meaning of data locality in HDFS? (We all know about the data locality of map task..)

 

I want to know the reason of the same performance between both two “copyToLocal” cases.

 

Thanks!

Yoonmin

 




// Yoonmin Nam


------=_Part_30982_1212850924.1426927997463-- ------=_Part_30981_1205995330.1426927997463--