hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Tustin <mtus...@handybook.com>
Subject Re: HDFS2 vs MaprFS
Date Sun, 05 Jun 2016 15:19:15 GMT
The namenode architecture is a source of fragility in HDFS. While a high
availability deployment (with two namenodes, and a failover mechanism)
means you're unlikely to see service interruption, it is still possible to
have a complete loss of filesystem metadata with the loss of two machines.

Secondly, because HDFS identifies datanodes by their hostname/ip, dns
changes can cause havoc with HDFS (see my war story on this here:

Also, the namenode/datanode architecture probably does contribute to the
small files problem being a problem. That said, there are lot of practical
solutions for the small files problem.

If you're just setting up a data infrastructure, I would say consider
alternatives before you pick HDFS. If you run in AWS, S3 is a good
alternative. If you run in some other cloud, it's probably worth
considering whatever their equivalent storage system is.

On Sat, Jun 4, 2016 at 7:43 AM, Ascot Moss <ascot.moss@gmail.com> wrote:

> Hi,
> I read some (old?) articles from Internet about Mapr-FS vs HDFS.
> https://www.mapr.com/products/m5-features/no-namenode-architecture
> It states that HDFS Federation has
> a) "Multiple Single Points of Failure", is it really true?
> Why MapR uses HDFS but not HDFS2 in its comparison as this would lead to
> an unfair comparison (or even misleading comparison)?  (HDFS was from
> Hadoop 1.x, the old generation) HDFS2 is available since 2013-10-15, there
> is no any Single Points of  Failure in HDFS2.
> b) "Limit to 50-200 million files", is it really true?
> I have seen so many real world Hadoop Clusters with over 10PB data, some
> even with 150PB data.  If "Limit to 50 -200 millions files" were true in
> HDFS2, why are there so many production Hadoop clusters in real world? how
> can they mange well the issue of  "Limit to 50-200 million files"? For
> instances,  the Facebook's "Like" implementation runs on HBase at Web
> Scale, I can image HBase generates huge number of files in Facbook's Hadoop
> cluster, the number of files in Facebook's Hadoop cluster should be much
> much bigger than 50-200 million.
> From my point of view, in contrast, MaprFS should have true limitation up
> to 1T files while HDFS2 can handle true unlimited files, please do correct
> me if I am wrong.
> c) "Performance Bottleneck", again, is it really true?
> MaprFS does not have namenode in order to gain file system performance. If
> without Namenode, MaprFS would lose Data Locality which is one of the
> beauties of Hadoop  If Data Locality is no longer available, any big data
> application running on MaprFS might gain some file system performance but
> it would totally lose the true gain of performance from Data Locality
> provided by Hadoop's namenode (gain small lose big)
> d) "Commercial NAS required"
> Is there any wiki/blog/discussion about Commercial NAS on Hadoop
> Federation?
> regards

Want to work at Handy? Check out our culture deck and open roles 
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
by Fidelity

View raw message