Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C0B0418B32 for ; Tue, 22 Mar 2016 05:08:35 +0000 (UTC) Received: (qmail 86637 invoked by uid 500); 22 Mar 2016 05:08:31 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 86454 invoked by uid 500); 22 Mar 2016 05:08:31 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 86444 invoked by uid 99); 22 Mar 2016 05:08:31 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Mar 2016 05:08:31 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id AE2D6C059B for ; Tue, 22 Mar 2016 05:08:30 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.998 X-Spam-Level: * X-Spam-Status: No, score=1.998 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id Qilbi6y3lO1a for ; Tue, 22 Mar 2016 05:08:27 +0000 (UTC) Received: from mail2.tma.com.vn (mail2.tma.com.vn [120.72.81.4]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 14AB95F247 for ; Tue, 22 Mar 2016 05:08:25 +0000 (UTC) Received: from smail.tma.com.vn ([192.168.1.42]) by mail2.tma.com.vn (8.13.8/8.13.8) with ESMTP id u2M589Cf001523 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) for ; Tue, 22 Mar 2016 12:08:09 +0700 Received: from hdthanh ([192.168.98.173]) by smail.tma.com.vn (8.14.7/8.13.8) with ESMTP id u2M5889l027823 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO) for ; Tue, 22 Mar 2016 12:08:09 +0700 From: "Thanh Hong Dai" To: Subject: A question regarding memory usage on NameNode and replication Date: Tue, 22 Mar 2016 12:08:08 +0700 Message-ID: <000601d183f8$d35e8490$7a1b8db0$@tma.com.vn> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0007_01D18433.7FBF5860" X-Mailer: Microsoft Outlook 15.0 Thread-Index: AdGD9pdebGuot67VQbGcAvb+3livdw== Content-Language: en-sg X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.0 (mail2.tma.com.vn [10.1.0.4]); Tue, 22 Mar 2016 12:08:09 +0700 (ICT) X-Virus-Scanned: clamav-milter 0.97.5 at mail2.tma.com.vn X-Virus-Status: Clean ------=_NextPart_000_0007_01D18433.7FBF5860 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Hi, To get to the point: Does the number of replicas of a block increases the memory requirement on NameNode, and by how much? The calculation in this paper https://www.usenix.org/legacy/publications/login/2010-04/openpdfs/shvachko.p df from Yahoo! assumes 200 bytes per metadata object, and with 1.5 block per file, it needs 3 object (1 for the file, 2 for the blocks). The replication factor is not mentioned in the paper and doesn't participate in the calculation. This email https://www.mail-archive.com/core-user@hadoop.apache.org/msg02835.html in the mailing list assumes 150 bytes per metadata object, but it messed up the calculation by an order of magnitude, since 1M files (1 block each) will use 2M metadata objects (1 for file, 1 for block), which results in 300MB, not 3GB. This article http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ from Cloudera cites the mail, but corrects the number to match the figure. The replication factor is not mentioned in both cases and does not participate in the calculation. This answer on StackOverflow https://stackoverflow.com/questions/10764493/namenode-file-quantity-limit adds two metadata object (for file and for block) for each replication, which does not match the method of calculation from the links above. Which one(s) of them is/are correct? Does replication use one metadata object per block replica, or only a slight increase in the size of the metadata object? Best regards, Hong Dai Thanh ------=_NextPart_000_0007_01D18433.7FBF5860 Content-Type: text/html; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable

Hi,

 

To get to = the point: Does the number of replicas of a block increases the memory = requirement on NameNode, and by how much?

 

The = calculation in this paper https://www.usenix.org/legacy/publications/login/2010-04/o= penpdfs/shvachko.pdf from Yahoo! assumes 200 bytes per metadata = object, and with 1.5 block per file, it needs 3 object (1 for the file, = 2 for the blocks). The replication factor is not mentioned in the paper = and doesn’t participate in the calculation.

 

This email = https://www.mail-archive.com/core-user@hadoop.apache.org/msg02835.= html in the mailing list assumes 150 bytes per metadata object, but = it messed up the calculation by an order of magnitude, since 1M files (1 = block each) will use 2M metadata objects (1 for file, 1 for block), = which results in 300MB, not 3GB. This article h= ttp://blog.cloudera.com/blog/2009/02/the-small-files-problem/ from = Cloudera cites the mail, but corrects the number to match the figure. = The replication factor is not mentioned in both cases and does not = participate in the calculation.

 

This answer = on StackOverflow https://stackoverflow.com/questions/10764493/namenode-file-quan= tity-limit adds two metadata object (for file and for block) for = each replication, which does not match the method of calculation from = the links above.

 

Which one(s) = of them is/are correct? Does replication use one metadata object per = block replica, or only a slight increase in the size of the metadata = object?

 

 

Best = regards,

Hong Dai = Thanh

 

------=_NextPart_000_0007_01D18433.7FBF5860--