hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: Any possible to set hdfs block size to a value smaller than 64MB?
Date Tue, 18 May 2010 18:57:44 GMT

Hey Konstantin,

Interesting paper :)

One thing which I've been kicking around lately is "at what scale does the file/directory
paradigm break down?"

At some point, I think the human mind can no longer comprehend so many files (certainly, I
can barely organize the few thousand files on my laptop).  Thus, file growth comes from (a)
having lots of humans use a single file system or (b) automated programs generate the files.
 For (a), you don't need a central global namespace, you just need the ability to have a "local"
namespace per-person that can be shared among friends.  For (b), a program isn't going to
be upset if you replace a file system with a database / dataset object / bucket.

Two examples:
- structural biology: I've seen a lot of different analysis workflows (such as autodock) that
compares a protein against a "database" of ligands, where the database is 80,000 O(4KB) files.
 Each file represents a known ligand that the biologist might come back and examine if it
is relevant to the study of their protein.
- high-energy physics: Each detector can produce millions of a events a night, and experiments
will produce many billions of events.  These are saved into files (each file containing hundreds
or thousands of events); these files are kept in collections (called datasets, data blocks,
lumi sections, or runs, depending on what you're doing).  Tasks are run against datasets;
they will output smaller datasets which the physicists will iterate upon until they get some
dataset which fits onto their laptop.
Here's my Claim: The biologists have a small enough number of objects to manage each one as
a separate file; they do this because it's easier for humans navigating around in a terminal.
 The physicists have such a huge number of objects that there's no way to manage them using
one file per object, so they utilize files only as a mechanism to serialize bytes of data
and have higher-order data structures for management
Here's my Question: at what point do you move from the biologist's model (named objects, managed
independently, single files) to the physicist's model (anonymous objects, managed in large
groups, files are only used because we save data on file systems)?

Another way to look at this is to consider DNS.  DNS maintains the namespace of the globe,
but appears to do this just fine without a single central catalog.  If you start with a POSIX
filesystem namespace (and the guarantees it implies), what rules must you relax in order to
arrive at DNS?  On the scale of managing million (billion? ten billion? trillion?) files,
are any of the assumptions relevant?

I don't know the answers to these questions, but I suspect they become important over the
next 10 years.

Brian

PS - I starting thinking along these lines during MSST when the LLNL guy was speculating about
what it meant to "fsck" a file system with 1 trillion files.

On May 18, 2010, at 12:56 PM, Konstantin Shvachko wrote:

> You can also get some performance numbers and answers to the block size dilemma problem
here:
> 
> http://developer.yahoo.net/blogs/hadoop/2010/05/scalability_of_the_hadoop_dist.html
> 
> I remember some people were using Hadoop for storing or streaming videos.
> Don't know how well that worked.
> It would be interesting to learn about your experience.
> 
> Thanks,
> --Konstantin
> 
> 
> On 5/18/2010 8:41 AM, Brian Bockelman wrote:
>> Hey Hassan,
>> 
>> 1) The overhead is pretty small, measured in a small number of milliseconds on average
>> 2) HDFS is not designed for "online latency".  Even though the average is small,
if something "bad happens", your clients might experience a lot of delays while going through
the retry stack.  The initial design was for batch processing, and latency-sensitive applications
came later.
>> 
>> Additionally since the NN is a SPOF, you might want to consider your uptime requirements.
 Each organization will have to balance these risks with the advantages (such as much cheaper
hardware).
>> 
>> There's a nice interview with the GFS authors here where they touch upon the latency
issues:
>> 
>> http://queue.acm.org/detail.cfm?id=1594206
>> 
>> As GFS and HDFS share many design features, the theoretical parts of their discussion
might be useful for you.
>> 
>> As far as overall throughput of the system goes, it depends heavily upon your implementation
and hardware.  Our HDFS routinely serves 5-10 Gbps.
>> 
>> Brian
>> 
>> On May 18, 2010, at 10:29 AM, Nyamul Hassan wrote:
>> 
>>> This is a very interesting thread to us, as we are thinking about deploying
>>> HDFS as a massive online storage for a on online university, and then
>>> serving the video files to students who want to view them.
>>> 
>>> We cannot control the size of the videos (and some class work files), as
>>> they will mostly be uploaded by the teachers providing the classes.
>>> 
>>> How would the overall through put of HDFS be affected in such a solution?
>>> Would HDFS be feasible at all for such a setup?
>>> 
>>> Regards
>>> HASSAN
>>> 
>>> 
>>> 
>>> On Tue, May 18, 2010 at 21:11, He Chen<airbots@gmail.com>  wrote:
>>> 
>>>> If you know how to use AspectJ to do aspect oriented programming. You can
>>>> write a aspect class. Let it just monitors the whole process of MapReduce
>>>> 
>>>> On Tue, May 18, 2010 at 10:00 AM, Patrick Angeles<patrick@cloudera.com
>>>>> wrote:
>>>> 
>>>>> Should be evident in the total job running time... that's the only metric
>>>>> that really matters :)
>>>>> 
>>>>> On Tue, May 18, 2010 at 10:39 AM, Pierre ANCELOT<pierreact@gmail.com
>>>>>> wrote:
>>>>> 
>>>>>> Thank you,
>>>>>> Any way I can measure the startup overhead in terms of time?
>>>>>> 
>>>>>> 
>>>>>> On Tue, May 18, 2010 at 4:27 PM, Patrick Angeles<patrick@cloudera.com
>>>>>>> wrote:
>>>>>> 
>>>>>>> Pierre,
>>>>>>> 
>>>>>>> Adding to what Brian has said (some things are not explicitly
>>>> mentioned
>>>>>> in
>>>>>>> the HDFS design doc)...
>>>>>>> 
>>>>>>> - If you have small files that take up<  64MB you do not actually
use
>>>>> the
>>>>>>> entire 64MB block on disk.
>>>>>>> - You *do* use up RAM on the NameNode, as each block represents
>>>>> meta-data
>>>>>>> that needs to be maintained in-memory in the NameNode.
>>>>>>> - Hadoop won't perform optimally with very small block sizes.
Hadoop
>>>>> I/O
>>>>>> is
>>>>>>> optimized for high sustained throughput per single file/block.
There
>>>> is
>>>>> a
>>>>>>> penalty for doing too many seeks to get to the beginning of each
>>>> block.
>>>>>>> Additionally, you will have a MapReduce task per small file.
Each
>>>>>> MapReduce
>>>>>>> task has a non-trivial startup overhead.
>>>>>>> - The recommendation is to consolidate your small files into
large
>>>>> files.
>>>>>>> One way to do this is via SequenceFiles... put the filename in
the
>>>>>>> SequenceFile key field, and the file's bytes in the SequenceFile
>>>> value
>>>>>>> field.
>>>>>>> 
>>>>>>> In addition to the HDFS design docs, I recommend reading this
blog
>>>>> post:
>>>>>>> http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>>>>>> 
>>>>>>> Happy Hadooping,
>>>>>>> 
>>>>>>> - Patrick
>>>>>>> 
>>>>>>> On Tue, May 18, 2010 at 9:11 AM, Pierre ANCELOT<pierreact@gmail.com
>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Okay, thank you :)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, May 18, 2010 at 2:48 PM, Brian Bockelman<
>>>>> bbockelm@cse.unl.edu
>>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On May 18, 2010, at 7:38 AM, Pierre ANCELOT wrote:
>>>>>>>>> 
>>>>>>>>>> Hi, thanks for this fast answer :)
>>>>>>>>>> If so, what do you mean by blocks? If a file has
to be
>>>> splitted,
>>>>> it
>>>>>>>> will
>>>>>>>>> be
>>>>>>>>>> splitted when larger than 64MB?
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> For every 64MB of the file, Hadoop will create a separate
block.
>>>>> So,
>>>>>>> if
>>>>>>>>> you have a 32KB file, there will be one block of 32KB.
 If the
>>>> file
>>>>>> is
>>>>>>>> 65MB,
>>>>>>>>> then it will have one block of 64MB and another block
of 1MB.
>>>>>>>>> 
>>>>>>>>> Splitting files is very useful for load-balancing and
>>>> distributing
>>>>>> I/O
>>>>>>>>> across multiple nodes.  At 32KB / file, you don't really
need to
>>>>>> split
>>>>>>>> the
>>>>>>>>> files at all.
>>>>>>>>> 
>>>>>>>>> I recommend reading the HDFS design document for background
>>>> issues
>>>>>> like
>>>>>>>>> this:
>>>>>>>>> 
>>>>>>>>> http://hadoop.apache.org/common/docs/r0.20.0/hdfs_design.html
>>>>>>>>> 
>>>>>>>>> Brian
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, May 18, 2010 at 2:34 PM, Brian Bockelman<
>>>>>>> bbockelm@cse.unl.edu
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hey Pierre,
>>>>>>>>>>> 
>>>>>>>>>>> These are not traditional filesystem blocks -
if you save a
>>>> file
>>>>>>>> smaller
>>>>>>>>>>> than 64MB, you don't lose 64MB of file space..
>>>>>>>>>>> 
>>>>>>>>>>> Hadoop will use 32KB to store a 32KB file (ok,
plus a KB of
>>>>>> metadata
>>>>>>>> or
>>>>>>>>>>> so), not 64MB.
>>>>>>>>>>> 
>>>>>>>>>>> Brian
>>>>>>>>>>> 
>>>>>>>>>>> On May 18, 2010, at 7:06 AM, Pierre ANCELOT wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> I'm porting a legacy application to hadoop
and it uses a
>>>> bunch
>>>>> of
>>>>>>>> small
>>>>>>>>>>>> files.
>>>>>>>>>>>> I'm aware that having such small files ain't
a good idea but
>>>>> I'm
>>>>>>> not
>>>>>>>>>>> doing
>>>>>>>>>>>> the technical decisions and the port has
to be done for
>>>>>>> yesterday...
>>>>>>>>>>>> Of course such small files are a problem,
loading 64MB blocks
>>>>> for
>>>>>> a
>>>>>>>> few
>>>>>>>>>>>> lines of text is an evident loss.
>>>>>>>>>>>> What will happen if I set a smaller, or even
way smaller
>>>> (32kB)
>>>>>>>> blocks?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thank you.
>>>>>>>>>>>> 
>>>>>>>>>>>> Pierre ANCELOT.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> http://www.neko-consulting.com
>>>>>>>>>> Ego sum quis ego servo
>>>>>>>>>> "Je suis ce que je protège"
>>>>>>>>>> "I am what I protect"
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> http://www.neko-consulting.com
>>>>>>>> Ego sum quis ego servo
>>>>>>>> "Je suis ce que je protège"
>>>>>>>> "I am what I protect"
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> http://www.neko-consulting.com
>>>>>> Ego sum quis ego servo
>>>>>> "Je suis ce que je protège"
>>>>>> "I am what I protect"
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best Wishes!
>>>> 顺送商祺!
>>>> 
>>>> --
>>>> Chen He
>>>> (402)613-9298
>>>> PhD. student of CSE Dept.
>>>> Holland Computing Center
>>>> University of Nebraska-Lincoln
>>>> Lincoln NE 68588
>>>> 
>> 


Mime
View raw message