Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3BECC200AC5 for ; Sun, 5 Jun 2016 10:14:47 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3A7A1160A28; Sun, 5 Jun 2016 08:14:47 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 34C6E160968 for ; Sun, 5 Jun 2016 10:14:46 +0200 (CEST) Received: (qmail 44353 invoked by uid 500); 5 Jun 2016 08:14:44 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 44342 invoked by uid 99); 5 Jun 2016 08:14:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Jun 2016 08:14:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 55A541A01E7 for ; Sun, 5 Jun 2016 08:14:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.198 X-Spam-Level: * X-Spam-Status: No, score=1.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id KAqI_SPqfhsg for ; Sun, 5 Jun 2016 08:14:39 +0000 (UTC) Received: from mail-io0-f177.google.com (mail-io0-f177.google.com [209.85.223.177]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id E45195F306 for ; Sun, 5 Jun 2016 08:14:38 +0000 (UTC) Received: by mail-io0-f177.google.com with SMTP id t40so116400537ioi.0 for ; Sun, 05 Jun 2016 01:14:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=G2OslImazypi4DRzDmEu8LKnkfiHmJYEFPHNeLB0oz0=; b=uCzXm9l6eGd6i299lcobY8NewOJUWP159K6nzSIu/b4oE0Dyls0W5wV+UmTn+HcEau Jowopb7CLiBQRfGMs+rRPrh1ymP/PyWCYyk9Ga/hoaMZn2Wn4uy6No4Pgf+pUllGQgMK gVcvk+dDabHG7GDMyZhoGAcBwODCFeqGuOkBKYR7QibBmaDvKFTCRVs/DnwL9i/Br/6d NF2Yhdcdp1eknsHnyGOusfH4mAqGzhT5cA8D3WlNd+KODqAG/2jek45vKNZwumk6D+i5 GJqAAtpDiAWW0HpNP+prrPV65ILvr9T2WY6u/0S+Xva0IjcJxC+tF1/Qs2LIXSEhLRJv /uew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=G2OslImazypi4DRzDmEu8LKnkfiHmJYEFPHNeLB0oz0=; b=JmJxBoQf3MdNE2CW1M0WTDg9XhZ23cXe9Sugaa9Mw1BkQDwBJBIub1Cg0G9J9a0glH X/IiMUgtc8MqP6rq9mXB06aepB5fgYHUvhyFAu54AEknRhmLlqglE9ZxuwFHdOQsuarz qSjtEJxgr7TbguJHuN+kQC+L0XsP5vU0CPQ4as3kIQOXDuqL/PQ3W6r4fVDuUIm992x1 rRSv6CIZViidx3D0qnaAUzD9ebriHKCxAxwT0Z4xWoGVDPFkTu40Jch/7SA0KvOUnpD/ Y5LKDFH9+BnO48UjaQHapbQTxn0aEgKwiTJr8XWzMYWTQrqP6wB6q3fmJmJ+JNdmpNW8 0vTw== X-Gm-Message-State: ALyK8tIQQ3LR7rh3Pt/3+qIez2J3eWV3rndyhdjT50WS/c1nv1pBkZkePM1SdpMi7/WVHaEtZZn3lpDn3v1GpQ== X-Received: by 10.107.23.69 with SMTP id 66mr14491611iox.186.1465114472612; Sun, 05 Jun 2016 01:14:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.142.70 with HTTP; Sun, 5 Jun 2016 01:14:32 -0700 (PDT) In-Reply-To: References: <0D85B8BC-5D45-474E-A7B7-C21AA2F816CF@gmail.com> From: Ascot Moss Date: Sun, 5 Jun 2016 16:14:32 +0800 Message-ID: Subject: Re: HDFS2 vs MaprFS To: daemeon reiydelle Cc: Gavin Yue , user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a114eae5cb206a30534838bcd archived-at: Sun, 05 Jun 2016 08:14:47 -0000 --001a114eae5cb206a30534838bcd Content-Type: text/plain; charset=UTF-8 Will the the common pool of datanodes and namenode federation be a more effective alternative in HDFS2 than multiple clusters? On Sun, Jun 5, 2016 at 12:19 PM, daemeon reiydelle wrote: > There are indeed many tuning points here. If the name nodes and journal > nodes can be larger, perhaps even bonding multiple 10gbyte nics, one can > easily scale. I did have one client where the file counts forced multiple > clusters. But we were able to differentiate by airframe types ... eg fixed > wing in one, rotary subsonic in another, etc. > > sent from my mobile > Daemeon C.M. Reiydelle > USA 415.501.0198 > London +44.0.20.8144.9872 > On Jun 4, 2016 2:23 PM, "Gavin Yue" wrote: > >> Here is what I found on Horton website. >> >> >> *Namespace scalability* >> >> While HDFS cluster storage scales horizontally with the addition of >> datanodes, the namespace does not. Currently the namespace can only be >> vertically scaled on a single namenode. The namenode stores the entire >> file system metadata in memory. This limits the number of blocks, files, >> and directories supported on the file system to what can be accommodated in >> the memory of a single namenode. A typical large deployment at Yahoo! >> includes an HDFS cluster with 2700-4200 datanodes with 180 million files >> and blocks, and address ~25 PB of storage. At Facebook, HDFS has around >> 2600 nodes, 300 million files and blocks, addressing up to 60PB of storage. >> While these are very large systems and good enough for majority of Hadoop >> users, a few deployments that might want to grow even larger could find the >> namespace scalability limiting. >> >> >> >> On Jun 4, 2016, at 04:43, Ascot Moss wrote: >> >> Hi, >> >> I read some (old?) articles from Internet about Mapr-FS vs HDFS. >> >> https://www.mapr.com/products/m5-features/no-namenode-architecture >> >> It states that HDFS Federation has >> >> a) "Multiple Single Points of Failure", is it really true? >> Why MapR uses HDFS but not HDFS2 in its comparison as this would lead to >> an unfair comparison (or even misleading comparison)? (HDFS was from >> Hadoop 1.x, the old generation) HDFS2 is available since 2013-10-15, there >> is no any Single Points of Failure in HDFS2. >> >> b) "Limit to 50-200 million files", is it really true? >> I have seen so many real world Hadoop Clusters with over 10PB data, some >> even with 150PB data. If "Limit to 50 -200 millions files" were true in >> HDFS2, why are there so many production Hadoop clusters in real world? how >> can they mange well the issue of "Limit to 50-200 million files"? For >> instances, the Facebook's "Like" implementation runs on HBase at Web >> Scale, I can image HBase generates huge number of files in Facbook's Hadoop >> cluster, the number of files in Facebook's Hadoop cluster should be much >> much bigger than 50-200 million. >> >> From my point of view, in contrast, MaprFS should have true limitation up >> to 1T files while HDFS2 can handle true unlimited files, please do correct >> me if I am wrong. >> >> c) "Performance Bottleneck", again, is it really true? >> MaprFS does not have namenode in order to gain file system performance. >> If without Namenode, MaprFS would lose Data Locality which is one of the >> beauties of Hadoop If Data Locality is no longer available, any big data >> application running on MaprFS might gain some file system performance but >> it would totally lose the true gain of performance from Data Locality >> provided by Hadoop's namenode (gain small lose big) >> >> d) "Commercial NAS required" >> Is there any wiki/blog/discussion about Commercial NAS on Hadoop >> Federation? >> >> regards >> >> >> >> --001a114eae5cb206a30534838bcd Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Will the the common pool of datanodes and namenode federat= ion be a more effective alternative in HDFS2 =C2=A0than multiple clusters?<= /div>

On Sun, Jun = 5, 2016 at 12:19 PM, daemeon reiydelle <daemeonr@gmail.com>= wrote:

There are indeed m= any tuning points here. If the name nodes and journal nodes can be larger, = perhaps even bonding multiple 10gbyte nics, one can easily scale. I did hav= e one client where the file counts forced multiple clusters. But we were ab= le to differentiate by airframe types ... eg fixed wing in one, rotary subs= onic in another, etc.

sent from my mobile
Daemeon C.M. Reiydelle
USA 4= 15.501.0198
London +44.0.20.8144.9872

On Jun 4, 2016 2:23 PM, "Gavin Yue" &l= t;yue.yuanyuan@= gmail.com> wrote:
Here is what I found on Horton websi= te. =C2=A0


Namespace scalability

While HDFS cluster storage scales horizo= ntally with the addition of datanodes, the namespace does not. Currently th= e namespace can only be vertically scaled on a single namenode.=C2=A0 The n= amenode stores the entire file system metadata in memory. This limits the n= umber of blocks, files, and directories supported on the file system to wha= t can be accommodated in the memory of a single namenode. A typical large d= eployment at Yahoo! includes an HDFS cluster with=C2=A02700-4200=C2=A0datanodes with 180 million= files and blocks, and address ~25 PB of storage.=C2=A0 At Facebook, HDFS h= as around 2600 nodes, 300 million files and blocks, addressing up to 60PB o= f storage. While these are very large systems and good enough for majority = of Hadoop users, a few deployments that might want to grow even larger coul= d find the namespace scalability limiting.


<= span style=3D"background-color:rgba(255,255,255,0)">

On Jun 4= , 2016, at 04:43, Ascot Moss <ascot.moss@gmail.com> wrote:

Hi,
=
I read some (old?) articles from Internet about Mapr-FS vs HDFS. =

https://www.mapr.com/products/m5-featur= es/no-namenode-architecture

It states that HDFS Federation= has

a) "Multiple Single Points of Failure", is it really= true?=C2=A0
Why MapR uses HDFS but not HDFS2 in its comparison as this= would lead to an unfair comparison (or even misleading comparison)?=C2=A0 = (HDFS was from Hadoop 1.x, the old generation) HDFS2 is available since 201= 3-10-15, there is no any Single Points of=C2=A0 Failure in HDFS2.

b)= "Limit to 50-200 million files", is it really true?
I have s= een so many real world Hadoop Clusters with over 10PB data, some even with = 150PB data.=C2=A0 If "Limit to 50 -200 millions files" were true = in HDFS2, why are there so many production Hadoop clusters in real world? h= ow can they mange well the issue of=C2=A0 "Limit to 50-200 million fil= es"? For instances,=C2=A0 the Facebook's "Like" implemen= tation runs on HBase at Web Scale, I can image HBase generates huge number = of files in Facbook's Hadoop cluster, the number of files in Facebook&#= 39;s Hadoop cluster should be much much bigger than 50-200 million.

=
From my point of view, in contrast, MaprFS should have true limi= tation up to 1T files while HDFS2 can handle true unlimited files, please d= o correct me if I am wrong.

c) "Performance Bo= ttleneck", again, is it really true?
MaprFS does not have namenode = in order to gain file system performance. If without Namenode, MaprFS would= lose Data Locality which is one of the beauties of Hadoop=C2=A0 If Data Lo= cality is no longer available, any big data application running on MaprFS m= ight gain some file system performance but it would totally lose the true g= ain of performance from Data Locality provided by Hadoop's namenode (ga= in small lose big)

d) "Commercial NAS required"
I= s there any wiki/blog/discussion about Commercial NAS on Hadoop Federation?=

regards
=C2=A0



--001a114eae5cb206a30534838bcd--