hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From He Chen <airb...@gmail.com>
Subject Re: Any possible to set hdfs block size to a value smaller than 64MB?
Date Tue, 18 May 2010 15:11:35 GMT
If you know how to use AspectJ to do aspect oriented programming. You can
write a aspect class. Let it just monitors the whole process of MapReduce

On Tue, May 18, 2010 at 10:00 AM, Patrick Angeles <patrick@cloudera.com>wrote:

> Should be evident in the total job running time... that's the only metric
> that really matters :)
>
> On Tue, May 18, 2010 at 10:39 AM, Pierre ANCELOT <pierreact@gmail.com
> >wrote:
>
> > Thank you,
> > Any way I can measure the startup overhead in terms of time?
> >
> >
> > On Tue, May 18, 2010 at 4:27 PM, Patrick Angeles <patrick@cloudera.com
> > >wrote:
> >
> > > Pierre,
> > >
> > > Adding to what Brian has said (some things are not explicitly mentioned
> > in
> > > the HDFS design doc)...
> > >
> > > - If you have small files that take up < 64MB you do not actually use
> the
> > > entire 64MB block on disk.
> > > - You *do* use up RAM on the NameNode, as each block represents
> meta-data
> > > that needs to be maintained in-memory in the NameNode.
> > > - Hadoop won't perform optimally with very small block sizes. Hadoop
> I/O
> > is
> > > optimized for high sustained throughput per single file/block. There is
> a
> > > penalty for doing too many seeks to get to the beginning of each block.
> > > Additionally, you will have a MapReduce task per small file. Each
> > MapReduce
> > > task has a non-trivial startup overhead.
> > > - The recommendation is to consolidate your small files into large
> files.
> > > One way to do this is via SequenceFiles... put the filename in the
> > > SequenceFile key field, and the file's bytes in the SequenceFile value
> > > field.
> > >
> > > In addition to the HDFS design docs, I recommend reading this blog
> post:
> > > http://www.cloudera.com/blog/2009/02/the-small-files-problem/
> > >
> > > Happy Hadooping,
> > >
> > > - Patrick
> > >
> > > On Tue, May 18, 2010 at 9:11 AM, Pierre ANCELOT <pierreact@gmail.com>
> > > wrote:
> > >
> > > > Okay, thank you :)
> > > >
> > > >
> > > > On Tue, May 18, 2010 at 2:48 PM, Brian Bockelman <
> bbockelm@cse.unl.edu
> > > > >wrote:
> > > >
> > > > >
> > > > > On May 18, 2010, at 7:38 AM, Pierre ANCELOT wrote:
> > > > >
> > > > > > Hi, thanks for this fast answer :)
> > > > > > If so, what do you mean by blocks? If a file has to be splitted,
> it
> > > > will
> > > > > be
> > > > > > splitted when larger than 64MB?
> > > > > >
> > > > >
> > > > > For every 64MB of the file, Hadoop will create a separate block.
>  So,
> > > if
> > > > > you have a 32KB file, there will be one block of 32KB.  If the file
> > is
> > > > 65MB,
> > > > > then it will have one block of 64MB and another block of 1MB.
> > > > >
> > > > > Splitting files is very useful for load-balancing and distributing
> > I/O
> > > > > across multiple nodes.  At 32KB / file, you don't really need to
> > split
> > > > the
> > > > > files at all.
> > > > >
> > > > > I recommend reading the HDFS design document for background issues
> > like
> > > > > this:
> > > > >
> > > > > http://hadoop.apache.org/common/docs/r0.20.0/hdfs_design.html
> > > > >
> > > > > Brian
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, May 18, 2010 at 2:34 PM, Brian Bockelman <
> > > bbockelm@cse.unl.edu
> > > > > >wrote:
> > > > > >
> > > > > >> Hey Pierre,
> > > > > >>
> > > > > >> These are not traditional filesystem blocks - if you save
a file
> > > > smaller
> > > > > >> than 64MB, you don't lose 64MB of file space..
> > > > > >>
> > > > > >> Hadoop will use 32KB to store a 32KB file (ok, plus a KB
of
> > metadata
> > > > or
> > > > > >> so), not 64MB.
> > > > > >>
> > > > > >> Brian
> > > > > >>
> > > > > >> On May 18, 2010, at 7:06 AM, Pierre ANCELOT wrote:
> > > > > >>
> > > > > >>> Hi,
> > > > > >>> I'm porting a legacy application to hadoop and it uses
a bunch
> of
> > > > small
> > > > > >>> files.
> > > > > >>> I'm aware that having such small files ain't a good
idea but
> I'm
> > > not
> > > > > >> doing
> > > > > >>> the technical decisions and the port has to be done
for
> > > yesterday...
> > > > > >>> Of course such small files are a problem, loading 64MB
blocks
> for
> > a
> > > > few
> > > > > >>> lines of text is an evident loss.
> > > > > >>> What will happen if I set a smaller, or even way smaller
(32kB)
> > > > blocks?
> > > > > >>>
> > > > > >>> Thank you.
> > > > > >>>
> > > > > >>> Pierre ANCELOT.
> > > > > >>
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > http://www.neko-consulting.com
> > > > > > Ego sum quis ego servo
> > > > > > "Je suis ce que je protège"
> > > > > > "I am what I protect"
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > http://www.neko-consulting.com
> > > > Ego sum quis ego servo
> > > > "Je suis ce que je protège"
> > > > "I am what I protect"
> > > >
> > >
> >
> >
> >
> > --
> > http://www.neko-consulting.com
> > Ego sum quis ego servo
> > "Je suis ce que je protège"
> > "I am what I protect"
> >
>



-- 
Best Wishes!
顺送商祺!

--
Chen He
(402)613-9298
PhD. student of CSE Dept.
Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message