hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alex kamil <alex.ka...@gmail.com>
Subject Re: Exponential performance decay when inserting large number of blocks
Date Thu, 14 Jan 2010 04:57:31 GMT
>launched 8 instances of the bin/hadoop fs -put utility
Zlatin, may be a silly question, are you running dfs -put locally on each
datanode,  or from a single box
Also where are you copying the data from, do you have local copies on each
node before the insert or all your files reside on a single server, or may
be on NFS?
i would also chk the network stats on datanodes and namenode and see if the
nics are not saturated, i guess you have enough bandwidth but may be there
is some issue with NIC on the namenode or something, i saw strange things
happening. you can probably monitor the number of conections/sockets,
bandwidth, IO waits, # of threads
if you are writing to dfs from a single location may be there is a problem
on a single node to handle all this outbound traffic, if you are
distributing files in parallel from multiple nodes, than mat be there is an
inbound congestion on namenode or something like that

if its not the case, i'd explore using distcp utility for copying data in
parallel  (it comes with the distro)
also if you really hit a wall, and have some time, i'd take look at
alternatives to Filesystem API, may be simething like Fuse-DFS and other
packages supported by libhdfs (http://wiki.apache.org/hadoop/LibHDFS)

On Wed, Jan 13, 2010 at 11:00 PM, Todd Lipcon <todd@cloudera.com> wrote:

> Err, ignore that attachment - attached the wrong graph with the right
> labels!
> Here's the right graph.
> -Todd
> On Wed, Jan 13, 2010 at 7:53 PM, Todd Lipcon <todd@cloudera.com> wrote:
>> On Wed, Jan 13, 2010 at 6:59 PM, Eric Sammer <eric@lifeless.net> wrote:
>>> On 1/13/10 8:12 PM, Zlatin.Balevsky@barclayscapital.com wrote:
>>> > Alex, Dhruba
>>> >
>>> > I repeated the experiment increasing the block size to 32k.  Still
>>> doing
>>> > 8 inserts in parallel, file size now is 512 MB; 11 datanodes.  I was
>>> > also running iostat on one of the datanodes.  Did not notice anything
>>> > that would explain an exponential slowdown.  There was more activity
>>> > while the inserts were active but far from the limits of the disk
>>> system.
>>> While creating many blocks, could it be that the replication pipe lining
>>> is eating up the available handler threads on the data nodes? By
>>> increasing the block size you would see better performance because the
>>> system spends more time writing data to local disk and less time dealing
>>> with things like replication "overhead." At a small block size, I could
>>> imagine you're artificially creating a situation where you saturate the
>>> default size configured thread pools or something weird like that.
>>> If you're doing 8 inserts in parallel from one machine with 11 nodes
>>> this seems unlikely, but it might be worth looking into. The question is
>>> if testing with an artificially small block size like this is even a
>>> viable test. At some point the overhead of talking to the name node,
>>> selecting data nodes for a block, and setting up replication pipe lines
>>> could become some abnormally high percentage of the run time.
>> The concern isn't why the insertion is slow, but rather why the scaling
>> curve looks the way it does. Looking at the data, it looks like the
>> insertion rate (blocks per second) is actually related as 1/n where N is the
>> number of blocks. Attaching another graph of the same data which I think is
>> a little clearer to read.
>>> Also, I wonder if the cluster is trying to rebalance blocks toward the
>>> end of your runtime (if the balancer daemon is running) and this is
>>> causing additional shuffling of data.
>> That's certainly one possibility.
>> Zlatin: here's a test to try: after the FS is full with 400,000 blocks,
>> let the cluster sit for a few hours, then come back and start another
>> insertion. Is the rate slow, or does it return to the fast starting speed?
>> -Todd

View raw message