hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hayati Gonultas <hayati.gonul...@gmail.com>
Subject Re: HDFS2 vs MaprFS
Date Sun, 05 Jun 2016 14:33:26 GMT
it is written 128 000 000 million in my previous post. it was incorrect
(million million)

what i mean is 128 million.

1gb raughly 1 million.
5 Haz 2016 16:58 tarihinde "Ascot Moss" <ascot.moss@gmail.com> yazd─▒:

> HDFS2 "Limit to 50-200 million files", is it really true like what MapR
> says?
>
> On Sun, Jun 5, 2016 at 7:55 PM, Hayati Gonultas <hayati.gonultas@gmail.com
> > wrote:
>
>> I forgot to mention about file system limit.
>>
>> Yes HDFS has limit, because for the performance considirations HDFS
>> filesystem is read from disk to RAM and rest of the work is done with RAM.
>> So RAM should be big enough to fit the filesystem image. But HDFS has
>> configuration options like har files (Hadoop Archive) to defeat these
>> limitations.
>>
>> On Sun, Jun 5, 2016 at 11:14 AM, Ascot Moss <ascot.moss@gmail.com> wrote:
>>
>>> Will the the common pool of datanodes and namenode federation be a more
>>> effective alternative in HDFS2  than multiple clusters?
>>>
>>> On Sun, Jun 5, 2016 at 12:19 PM, daemeon reiydelle <daemeonr@gmail.com>
>>> wrote:
>>>
>>>> There are indeed many tuning points here. If the name nodes and journal
>>>> nodes can be larger, perhaps even bonding multiple 10gbyte nics, one can
>>>> easily scale. I did have one client where the file counts forced multiple
>>>> clusters. But we were able to differentiate by airframe types ... eg fixed
>>>> wing in one, rotary subsonic in another, etc.
>>>>
>>>> sent from my mobile
>>>> Daemeon C.M. Reiydelle
>>>> USA 415.501.0198
>>>> London +44.0.20.8144.9872
>>>> On Jun 4, 2016 2:23 PM, "Gavin Yue" <yue.yuanyuan@gmail.com> wrote:
>>>>
>>>>> Here is what I found on Horton website.
>>>>>
>>>>>
>>>>> *Namespace scalability*
>>>>>
>>>>> While HDFS cluster storage scales horizontally with the addition of
>>>>> datanodes, the namespace does not. Currently the namespace can only be
>>>>> vertically scaled on a single namenode.  The namenode stores the entire
>>>>> file system metadata in memory. This limits the number of blocks, files,
>>>>> and directories supported on the file system to what can be accommodated
in
>>>>> the memory of a single namenode. A typical large deployment at Yahoo!
>>>>> includes an HDFS cluster with 2700-4200 datanodes with 180 million
>>>>> files and blocks, and address ~25 PB of storage.  At Facebook, HDFS has
>>>>> around 2600 nodes, 300 million files and blocks, addressing up to 60PB
of
>>>>> storage. While these are very large systems and good enough for majority
of
>>>>> Hadoop users, a few deployments that might want to grow even larger could
>>>>> find the namespace scalability limiting.
>>>>>
>>>>>
>>>>>
>>>>> On Jun 4, 2016, at 04:43, Ascot Moss <ascot.moss@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I read some (old?) articles from Internet about Mapr-FS vs HDFS.
>>>>>
>>>>> https://www.mapr.com/products/m5-features/no-namenode-architecture
>>>>>
>>>>> It states that HDFS Federation has
>>>>>
>>>>> a) "Multiple Single Points of Failure", is it really true?
>>>>> Why MapR uses HDFS but not HDFS2 in its comparison as this would lead
>>>>> to an unfair comparison (or even misleading comparison)?  (HDFS was from
>>>>> Hadoop 1.x, the old generation) HDFS2 is available since 2013-10-15,
there
>>>>> is no any Single Points of  Failure in HDFS2.
>>>>>
>>>>> b) "Limit to 50-200 million files", is it really true?
>>>>> I have seen so many real world Hadoop Clusters with over 10PB data,
>>>>> some even with 150PB data.  If "Limit to 50 -200 millions files" were
true
>>>>> in HDFS2, why are there so many production Hadoop clusters in real world?
>>>>> how can they mange well the issue of  "Limit to 50-200 million files"?
For
>>>>> instances,  the Facebook's "Like" implementation runs on HBase at Web
>>>>> Scale, I can image HBase generates huge number of files in Facbook's
Hadoop
>>>>> cluster, the number of files in Facebook's Hadoop cluster should be much
>>>>> much bigger than 50-200 million.
>>>>>
>>>>> From my point of view, in contrast, MaprFS should have true limitation
>>>>> up to 1T files while HDFS2 can handle true unlimited files, please do
>>>>> correct me if I am wrong.
>>>>>
>>>>> c) "Performance Bottleneck", again, is it really true?
>>>>> MaprFS does not have namenode in order to gain file system
>>>>> performance. If without Namenode, MaprFS would lose Data Locality which
is
>>>>> one of the beauties of Hadoop  If Data Locality is no longer available,
any
>>>>> big data application running on MaprFS might gain some file system
>>>>> performance but it would totally lose the true gain of performance from
>>>>> Data Locality provided by Hadoop's namenode (gain small lose big)
>>>>>
>>>>> d) "Commercial NAS required"
>>>>> Is there any wiki/blog/discussion about Commercial NAS on Hadoop
>>>>> Federation?
>>>>>
>>>>> regards
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>
>>
>> --
>> Hayati Gonultas
>>
>
>

Mime
View raw message