hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ascot Moss <ascot.m...@gmail.com>
Subject Re: HDFS2 vs MaprFS
Date Mon, 06 Jun 2016 15:32:17 GMT
Since MapR  is proprietary, I find that it has many compatibility issues in
Apache open source projects, or even worse, lose Hadoop's features.  For
instances, Hadoop has a built-in storage policy named COLD, where is it in
Mapr-FS? no to mention that Mapr-FS  loses Data-Locality.

On Mon, Jun 6, 2016 at 11:26 PM, Ascot Moss <ascot.moss@gmail.com> wrote:

> I don't think HDFS2 needs SAN, use the QuorumJournal approach is much
> better than using Shared edits directory SAN approach.
>
>
>
>
> On Monday, June 6, 2016, Peyman Mohajerian <mohajeri@gmail.com> wrote:
>
>> It is very common practice to backup the metadata in some SAN store. So
>> the idea of complete loss of all the metadata is preventable. You could
>> lose a day worth of data if e.g. you back the metadata once a day but you
>> could do it more frequently. I'm not saying S3 or Azure Blob are bad ideas.
>>
>> On Sun, Jun 5, 2016 at 8:19 AM, Marcin Tustin <mtustin@handybook.com>
>> wrote:
>>
>>> The namenode architecture is a source of fragility in HDFS. While a high
>>> availability deployment (with two namenodes, and a failover mechanism)
>>> means you're unlikely to see service interruption, it is still possible to
>>> have a complete loss of filesystem metadata with the loss of two machines.
>>>
>>> Secondly, because HDFS identifies datanodes by their hostname/ip, dns
>>> changes can cause havoc with HDFS (see my war story on this here:
>>> https://medium.com/handy-tech/renaming-hdfs-datanodes-considered-terribly-harmful-2bc2f37aabab
>>> ).
>>>
>>> Also, the namenode/datanode architecture probably does contribute to the
>>> small files problem being a problem. That said, there are lot of practical
>>> solutions for the small files problem.
>>>
>>> If you're just setting up a data infrastructure, I would say consider
>>> alternatives before you pick HDFS. If you run in AWS, S3 is a good
>>> alternative. If you run in some other cloud, it's probably worth
>>> considering whatever their equivalent storage system is.
>>>
>>>
>>> On Sat, Jun 4, 2016 at 7:43 AM, Ascot Moss <ascot.moss@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I read some (old?) articles from Internet about Mapr-FS vs HDFS.
>>>>
>>>> https://www.mapr.com/products/m5-features/no-namenode-architecture
>>>>
>>>> It states that HDFS Federation has
>>>>
>>>> a) "Multiple Single Points of Failure", is it really true?
>>>> Why MapR uses HDFS but not HDFS2 in its comparison as this would lead
>>>> to an unfair comparison (or even misleading comparison)?  (HDFS was from
>>>> Hadoop 1.x, the old generation) HDFS2 is available since 2013-10-15, there
>>>> is no any Single Points of  Failure in HDFS2.
>>>>
>>>> b) "Limit to 50-200 million files", is it really true?
>>>> I have seen so many real world Hadoop Clusters with over 10PB data,
>>>> some even with 150PB data.  If "Limit to 50 -200 millions files" were true
>>>> in HDFS2, why are there so many production Hadoop clusters in real world?
>>>> how can they mange well the issue of  "Limit to 50-200 million files"? For
>>>> instances,  the Facebook's "Like" implementation runs on HBase at Web
>>>> Scale, I can image HBase generates huge number of files in Facbook's Hadoop
>>>> cluster, the number of files in Facebook's Hadoop cluster should be much
>>>> much bigger than 50-200 million.
>>>>
>>>> From my point of view, in contrast, MaprFS should have true limitation
>>>> up to 1T files while HDFS2 can handle true unlimited files, please do
>>>> correct me if I am wrong.
>>>>
>>>> c) "Performance Bottleneck", again, is it really true?
>>>> MaprFS does not have namenode in order to gain file system performance.
>>>> If without Namenode, MaprFS would lose Data Locality which is one of the
>>>> beauties of Hadoop  If Data Locality is no longer available, any big data
>>>> application running on MaprFS might gain some file system performance but
>>>> it would totally lose the true gain of performance from Data Locality
>>>> provided by Hadoop's namenode (gain small lose big)
>>>>
>>>> d) "Commercial NAS required"
>>>> Is there any wiki/blog/discussion about Commercial NAS on Hadoop
>>>> Federation?
>>>>
>>>> regards
>>>>
>>>>
>>>>
>>>>
>>>
>>> Want to work at Handy? Check out our culture deck and open roles
>>> <http://www.handy.com/careers>
>>> Latest news <http://www.handy.com/press> at Handy
>>> Handy just raised $50m
>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led
>>> by Fidelity
>>>
>>>
>>

Mime
View raw message