hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ascot Moss <ascot.m...@gmail.com>
Subject HDFS2 vs MaprFS
Date Sat, 04 Jun 2016 11:43:58 GMT

I read some (old?) articles from Internet about Mapr-FS vs HDFS.


It states that HDFS Federation has

a) "Multiple Single Points of Failure", is it really true?
Why MapR uses HDFS but not HDFS2 in its comparison as this would lead to an
unfair comparison (or even misleading comparison)?  (HDFS was from Hadoop
1.x, the old generation) HDFS2 is available since 2013-10-15, there is no
any Single Points of  Failure in HDFS2.

b) "Limit to 50-200 million files", is it really true?
I have seen so many real world Hadoop Clusters with over 10PB data, some
even with 150PB data.  If "Limit to 50 -200 millions files" were true in
HDFS2, why are there so many production Hadoop clusters in real world? how
can they mange well the issue of  "Limit to 50-200 million files"? For
instances,  the Facebook's "Like" implementation runs on HBase at Web
Scale, I can image HBase generates huge number of files in Facbook's Hadoop
cluster, the number of files in Facebook's Hadoop cluster should be much
much bigger than 50-200 million.

>From my point of view, in contrast, MaprFS should have true limitation up
to 1T files while HDFS2 can handle true unlimited files, please do correct
me if I am wrong.

c) "Performance Bottleneck", again, is it really true?
MaprFS does not have namenode in order to gain file system performance. If
without Namenode, MaprFS would lose Data Locality which is one of the
beauties of Hadoop  If Data Locality is no longer available, any big data
application running on MaprFS might gain some file system performance but
it would totally lose the true gain of performance from Data Locality
provided by Hadoop's namenode (gain small lose big)

d) "Commercial NAS required"
Is there any wiki/blog/discussion about Commercial NAS on Hadoop Federation?


View raw message