hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <...@yahoo-inc.com>
Subject Re: Large data sets
Date Tue, 06 Feb 2007 19:20:27 GMT
Despite common misconception HDFS inodes do not store the entire file 
paths just the names.
The underlying namespace data structure is pretty straightforward.
It reflects the actual namespace tree. Each directory node has pointers 
to its children, which
are combined in a TreeMap. So the access to directory entries is 
logarithmically reasonable (or
reasonably logarithmic if you wish :-) unlike traditional FSs.

200 bytes per file is theoretically correct, but rather optimistic :-(
 From a real system memory utilization I can see that HDFS uses 1.5-2K 
per file.
And since each real file is internally represented by two files (1 real 
+ 1 crc) the real
estimate per file should read 3-4K.
But this includes everything - block to data-node map, node to block 
map, etc.
Storing of a 2M file directory will require 6-8Gb (on 64bit hardware) 
but is still feasible.

As far as I know HDFS does not have explicit restrictions on the number 
of files in one directory.
File path is limited, but not the directory size.
I haven't heard that anybody tried to push it to the limit in this 
direction yet.
This could be interesting.


Bryan A. P. Pendleton wrote:

> Looks like an interesting problem. The number of files stored in HDFS is,
> roughly, limited to the amount of memory in the NameNode needed to track
> each file. Each file is going to require some number of bytes of 
> storage on
> the NameNode (at a minimum, enough to represent its name(~full path size
> now, but could be made to be parent/differential to just the unique file
> prefix), the names of each block (so, 64bits per 64mb, given the current
> defaults), and, for each block, a reference to the nodes that contain 
> that
> block. So, a 100Mb MapFile is going to require two filename strings, 3 
> 64bit
> block references, and at least a platform pointer for each block for each
> replication - I'd guess you could squeeze that to 100-200 bytes per 
> file in
> a pinch, though I'd guess that it's more in the current implementation of
> the NameNode code.
> So, for your example numbers (a 200Tb HBase, with, thus, 2M MapFiles), 
> you'd
> need 2M*2*200 bytes, or something like 400Mb of memory on the 
> NameNode. So,
> *shrug*, it sounds theoretically possible, and probably means it's not 
> the
> point to spend time optimizing against right now.
> By the way - I'm at PARC, but hide behind a personal e-mail address for
> Hadoop discussion. Fun to see what you guys at Powerset are working 
> on. :)
> On 2/5/07, Jim Kellerman <jim@powerset.com> wrote:
>> Bryan,
>> Storing many low-record-count files is not what I am concerned about so
>> much
>> as, storing many high-record-count files. In the example
>> (http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture#example),
>> for one column of one table we are talking about 2M MapFiles to hold the
>> data for one column of a particular table. Now they have to have some 
>> name, so does that mean they need to live in a HDFS directory or can 
>> files live outside the HDFS directory space even if they are to be
>> persistent? (This is an area of HDFS I haven't explored much, so I don't
>> know if we have to do something to make it work.) Certainly you could 
>> not
>> put 2M files in a Unix directory and expect it to work. I'm just 
>> trying to
>> understand if there are similar limitations in Hadoop.
>> Any light that could shed on the matter would be greatly appreciated.
>> Thanks.
>> -Jim
>> On 2/2/07 12:21 PM, "Bryan A. P. Pendleton" <bp@geekdom.net> wrote:
>> > We have a cluster of about 40 nodes, with about 14Tb of aggregate raw
>> > storage.. At peak times, I have had up to 3 or 4 terabytes of data
>> stored in
>> > HDFS, stored in probably 100-200k files.
>> >
>> > To make things work for my tasks, I had to hash through a few 
>> different
>> > tricks for dealing with large sets of data - not all of the tools you
>> might
>> > like for combining different sequential streams of data in Hadoop are
>> > around. In particular, running MapReduce processes to re-key or
>> variously
>> > mix sequential inputs for further processing can be problematic when
>> your
>> > dataset is already taxing your storage. If you read through the 
>> history
>> of
>> > this list, you'll see that I'm often agitating about bugs in handling
>> > low-disk-space conditions, storage balancing, and problems related to
>> > numbers of simultaneous open files.
>> >
>> > I haven't generally run into files-per-directory problems, because I
>> > introduce my data into SequenceFile or MapFile formats as soon as
>> possible,
>> > then do work across segments of that. Storing individual
>> low-record-count
>> > files in HDFS is definitely a no-no given the current limits of the
>> system.
>> >
>> > Feel free to write me off-list if you want to know more particulars of
>> how
>> > I've been using the system.
>> >
>> > On 2/2/07, Jim Kellerman <jim@powerset.com> wrote:
>> >>
>> >> I am part of a working group that is developing a Bigtable-like
>> structured
>> >> storage system for Hadoop HDFS (see
>> >> http://wiki.apache.org/lucene-hadoop/Hbase).
>> >>
>> >> I am interested in learning about large HDFS installations:
>> >>
>> >> - How many nodes do you have in a cluster?
>> >>
>> >> - How much data do you store in HDFS?
>> >>
>> >> - How many files do you have in HDFS?
>> >>
>> >> - Have you run into any limitations that have prevented you from
>> growing
>> >>   your application?
>> >>
>> >> - Are there limitations in how many files you can put in a single
>> >> directory?
>> >>
>> >>   Google's GFS, for example does not really implement directories
>> per-se,
>> >>   so it does not suffer from performance problems related to 
>> having too
>> >>   many files in a directory as traditional file systems do.
>> >>
>> >> The largest system I know about has about 1.5M files and about 
>> 150GB of
>> >> data. If anyone has a larger system in use, I'd really like to hear
>> from
>> >> you. Were there particular obstacles you had in growing your 
>> system to
>> >> that
>> >> size, etc?
>> >>
>> >> Thanks in advance.
>> >> --
>> >> Jim Kellerman, Senior Engineer; Powerset
>> jim@powerset.com
>> >>
>> >
>> >
>> -- 
>> Jim Kellerman, Senior Engineer; Powerset                jim@powerset.com

View raw message