hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ascot Moss <ascot.m...@gmail.com>
Subject Re: HDFS2 vs MaprFS
Date Sun, 05 Jun 2016 08:14:32 GMT
Will the the common pool of datanodes and namenode federation be a more
effective alternative in HDFS2  than multiple clusters?

On Sun, Jun 5, 2016 at 12:19 PM, daemeon reiydelle <daemeonr@gmail.com>
wrote:

> There are indeed many tuning points here. If the name nodes and journal
> nodes can be larger, perhaps even bonding multiple 10gbyte nics, one can
> easily scale. I did have one client where the file counts forced multiple
> clusters. But we were able to differentiate by airframe types ... eg fixed
> wing in one, rotary subsonic in another, etc.
>
> sent from my mobile
> Daemeon C.M. Reiydelle
> USA 415.501.0198
> London +44.0.20.8144.9872
> On Jun 4, 2016 2:23 PM, "Gavin Yue" <yue.yuanyuan@gmail.com> wrote:
>
>> Here is what I found on Horton website.
>>
>>
>> *Namespace scalability*
>>
>> While HDFS cluster storage scales horizontally with the addition of
>> datanodes, the namespace does not. Currently the namespace can only be
>> vertically scaled on a single namenode.  The namenode stores the entire
>> file system metadata in memory. This limits the number of blocks, files,
>> and directories supported on the file system to what can be accommodated in
>> the memory of a single namenode. A typical large deployment at Yahoo!
>> includes an HDFS cluster with 2700-4200 datanodes with 180 million files
>> and blocks, and address ~25 PB of storage.  At Facebook, HDFS has around
>> 2600 nodes, 300 million files and blocks, addressing up to 60PB of storage.
>> While these are very large systems and good enough for majority of Hadoop
>> users, a few deployments that might want to grow even larger could find the
>> namespace scalability limiting.
>>
>>
>>
>> On Jun 4, 2016, at 04:43, Ascot Moss <ascot.moss@gmail.com> wrote:
>>
>> Hi,
>>
>> I read some (old?) articles from Internet about Mapr-FS vs HDFS.
>>
>> https://www.mapr.com/products/m5-features/no-namenode-architecture
>>
>> It states that HDFS Federation has
>>
>> a) "Multiple Single Points of Failure", is it really true?
>> Why MapR uses HDFS but not HDFS2 in its comparison as this would lead to
>> an unfair comparison (or even misleading comparison)?  (HDFS was from
>> Hadoop 1.x, the old generation) HDFS2 is available since 2013-10-15, there
>> is no any Single Points of  Failure in HDFS2.
>>
>> b) "Limit to 50-200 million files", is it really true?
>> I have seen so many real world Hadoop Clusters with over 10PB data, some
>> even with 150PB data.  If "Limit to 50 -200 millions files" were true in
>> HDFS2, why are there so many production Hadoop clusters in real world? how
>> can they mange well the issue of  "Limit to 50-200 million files"? For
>> instances,  the Facebook's "Like" implementation runs on HBase at Web
>> Scale, I can image HBase generates huge number of files in Facbook's Hadoop
>> cluster, the number of files in Facebook's Hadoop cluster should be much
>> much bigger than 50-200 million.
>>
>> From my point of view, in contrast, MaprFS should have true limitation up
>> to 1T files while HDFS2 can handle true unlimited files, please do correct
>> me if I am wrong.
>>
>> c) "Performance Bottleneck", again, is it really true?
>> MaprFS does not have namenode in order to gain file system performance.
>> If without Namenode, MaprFS would lose Data Locality which is one of the
>> beauties of Hadoop  If Data Locality is no longer available, any big data
>> application running on MaprFS might gain some file system performance but
>> it would totally lose the true gain of performance from Data Locality
>> provided by Hadoop's namenode (gain small lose big)
>>
>> d) "Commercial NAS required"
>> Is there any wiki/blog/discussion about Commercial NAS on Hadoop
>> Federation?
>>
>> regards
>>
>>
>>
>>

Mime
View raw message