hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ascot Moss <ascot.m...@gmail.com>
Subject Re: HDFS2 vs MaprFS
Date Mon, 06 Jun 2016 16:35:06 GMT
Hi Aaron, from MapR site, [now HDSF2] "Limit to 50-200 million files", is
it really true?

On Tue, Jun 7, 2016 at 12:09 AM, Aaron Eng <aeng@maprtech.com> wrote:

> As I said, MapRFS has topologies.  You assign a volume (which is mounted
> at a directory path) to a topology and in turn all the data for the volume
> (e.g. under the directory) is stored on the storage hardware assigned to
> the topology.
>
> These topological labels provide the same benefits as dfs.storage.policy
> as well as enabling additional types of use cases.
>
> On Mon, Jun 6, 2016 at 9:02 AM, Ascot Moss <ascot.moss@gmail.com> wrote:
>
>> In HDFS2, I can find "dfs.storage.policy",  for instances, HDFS2 allows
>> to *Apply the COLD storage policy to a directory,*
>>  where are these features in Mapr-FS?
>>
>> On Mon, Jun 6, 2016 at 11:43 PM, Aaron Eng <aeng@maprtech.com> wrote:
>>
>>> >Since MapR  is proprietary, I find that it has many compatibility
>>> issues in Apache open source projects
>>>
>>> This is faulty logic. And rather than saying it has "many compatibility
>>> issues", perhaps you can describe one.
>>>
>>> Both MapRFS and HDFS are accessible through the same API.  The backend
>>> implementations are what differs.
>>>
>>> >Hadoop has a built-in storage policy named COLD, where is it in
>>> Mapr-FS?
>>>
>>> Long before HDFS had storage policies, MapRFS had topologies.  You can
>>> restrict particular types of storage to a topology and then assign a volume
>>> (subset of data stored in MapRFS) to the topology, and hence the data in
>>> that subset would be served by whatever hardware was mapped into the
>>> topology.
>>>
>>> >no to mention that Mapr-FS  loses Data-Locality.
>>>
>>> This statement is false.
>>>
>>>
>>>
>>> On Mon, Jun 6, 2016 at 8:32 AM, Ascot Moss <ascot.moss@gmail.com> wrote:
>>>
>>>> Since MapR  is proprietary, I find that it has many compatibility
>>>> issues in Apache open source projects, or even worse, lose Hadoop's
>>>> features.  For instances, Hadoop has a built-in storage policy named COLD,
>>>> where is it in Mapr-FS? no to mention that Mapr-FS  loses Data-Locality.
>>>>
>>>> On Mon, Jun 6, 2016 at 11:26 PM, Ascot Moss <ascot.moss@gmail.com>
>>>> wrote:
>>>>
>>>>> I don't think HDFS2 needs SAN, use the QuorumJournal approach is much
>>>>> better than using Shared edits directory SAN approach.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Monday, June 6, 2016, Peyman Mohajerian <mohajeri@gmail.com>
wrote:
>>>>>
>>>>>> It is very common practice to backup the metadata in some SAN store.
>>>>>> So the idea of complete loss of all the metadata is preventable.
You could
>>>>>> lose a day worth of data if e.g. you back the metadata once a day
but you
>>>>>> could do it more frequently. I'm not saying S3 or Azure Blob are
bad ideas.
>>>>>>
>>>>>> On Sun, Jun 5, 2016 at 8:19 AM, Marcin Tustin <mtustin@handybook.com>
>>>>>> wrote:
>>>>>>
>>>>>>> The namenode architecture is a source of fragility in HDFS. While
a
>>>>>>> high availability deployment (with two namenodes, and a failover
mechanism)
>>>>>>> means you're unlikely to see service interruption, it is still
possible to
>>>>>>> have a complete loss of filesystem metadata with the loss of
two machines.
>>>>>>>
>>>>>>> Secondly, because HDFS identifies datanodes by their hostname/ip,
>>>>>>> dns changes can cause havoc with HDFS (see my war story on this
here:
>>>>>>> https://medium.com/handy-tech/renaming-hdfs-datanodes-considered-terribly-harmful-2bc2f37aabab
>>>>>>> ).
>>>>>>>
>>>>>>> Also, the namenode/datanode architecture probably does contribute
to
>>>>>>> the small files problem being a problem. That said, there are
lot of
>>>>>>> practical solutions for the small files problem.
>>>>>>>
>>>>>>> If you're just setting up a data infrastructure, I would say
>>>>>>> consider alternatives before you pick HDFS. If you run in AWS,
S3 is a good
>>>>>>> alternative. If you run in some other cloud, it's probably worth
>>>>>>> considering whatever their equivalent storage system is.
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Jun 4, 2016 at 7:43 AM, Ascot Moss <ascot.moss@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I read some (old?) articles from Internet about Mapr-FS vs
HDFS.
>>>>>>>>
>>>>>>>> https://www.mapr.com/products/m5-features/no-namenode-architecture
>>>>>>>>
>>>>>>>> It states that HDFS Federation has
>>>>>>>>
>>>>>>>> a) "Multiple Single Points of Failure", is it really true?
>>>>>>>> Why MapR uses HDFS but not HDFS2 in its comparison as this
would
>>>>>>>> lead to an unfair comparison (or even misleading comparison)?
 (HDFS was
>>>>>>>> from Hadoop 1.x, the old generation) HDFS2 is available since
2013-10-15,
>>>>>>>> there is no any Single Points of  Failure in HDFS2.
>>>>>>>>
>>>>>>>> b) "Limit to 50-200 million files", is it really true?
>>>>>>>> I have seen so many real world Hadoop Clusters with over
10PB data,
>>>>>>>> some even with 150PB data.  If "Limit to 50 -200 millions
files" were true
>>>>>>>> in HDFS2, why are there so many production Hadoop clusters
in real world?
>>>>>>>> how can they mange well the issue of  "Limit to 50-200 million
files"? For
>>>>>>>> instances,  the Facebook's "Like" implementation runs on
HBase at Web
>>>>>>>> Scale, I can image HBase generates huge number of files in
Facbook's Hadoop
>>>>>>>> cluster, the number of files in Facebook's Hadoop cluster
should be much
>>>>>>>> much bigger than 50-200 million.
>>>>>>>>
>>>>>>>> From my point of view, in contrast, MaprFS should have true
>>>>>>>> limitation up to 1T files while HDFS2 can handle true unlimited
files,
>>>>>>>> please do correct me if I am wrong.
>>>>>>>>
>>>>>>>> c) "Performance Bottleneck", again, is it really true?
>>>>>>>> MaprFS does not have namenode in order to gain file system
>>>>>>>> performance. If without Namenode, MaprFS would lose Data
Locality which is
>>>>>>>> one of the beauties of Hadoop  If Data Locality is no longer
available, any
>>>>>>>> big data application running on MaprFS might gain some file
system
>>>>>>>> performance but it would totally lose the true gain of performance
from
>>>>>>>> Data Locality provided by Hadoop's namenode (gain small lose
big)
>>>>>>>>
>>>>>>>> d) "Commercial NAS required"
>>>>>>>> Is there any wiki/blog/discussion about Commercial NAS on
Hadoop
>>>>>>>> Federation?
>>>>>>>>
>>>>>>>> regards
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Want to work at Handy? Check out our culture deck and open roles
>>>>>>> <http://www.handy.com/careers>
>>>>>>> Latest news <http://www.handy.com/press> at Handy
>>>>>>> Handy just raised $50m
>>>>>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led
>>>>>>> by Fidelity
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Mime
View raw message