hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Eastman" <j...@windwardsolutions.com>
Subject RE: Performance / cluster scaling question
Date Fri, 21 Mar 2008 21:30:49 GMT
What's your replication factor? 
Jeff

> -----Original Message-----
> From: André Martin [mailto:mail@andremartin.de]
> Sent: Friday, March 21, 2008 2:25 PM
> To: core-user@hadoop.apache.org
> Subject: Performance / cluster scaling question
> 
> Hi everyone,
> I ran a distributed system that consists of 50 spiders/crawlers and 8
> server nodes with a Hadoop DFS cluster with 8 datanodes and a namenode...
> Each spider has 5 job processing / data crawling threads and puts
> crawled data as one complete file onto the DFS - additionally there are
> splits created for each server node that are put as files onto the DFS
> as well. So basically there are 50*5*9 = ~2250 concurrent writes across
> 8 datanodes.
> The splits are read by the server nodes and will be deleted afterwards,
> so those (split)-files exists for only a few seconds to minutes...
> Since 99% of the files have a size of less than 64 MB (the default block
> size) I expected that the number of files is roughly equal to the number
> of blocks. After running the system for 24hours the namenode WebUI shows
> 423763 files and directories and 1480735 blocks. It looks like that the
> system does not catch up with deleting all the invalidated blocks - my
> assumption?!?
> Also, I noticed that the overall performance of the cluster goes down
> (see attached image).
> There are a bunch of Could not get block locations. Aborting...
> exceptions and those exceptions seem to appear more frequently towards
> the end of the experiment.
> > java.io.IOException: Could not get block locations. Aborting...
> >     at
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSCl
> ient.java:1824)
> >     at
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1100(DFSClient.java
> :1479)
> >     at
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient
> .java:1571)
> So, is the cluster simply saturated with the such a frequent creation
> and deletion of files, or is the network that actual bottleneck? The
> work load does not change at all during the whole experiment.
> On cluster side I see lots of the following exceptions:
> > 2008-03-21 20:28:05,411 INFO org.apache.hadoop.dfs.DataNode:
> > PacketResponder 1 for block blk_6757062148746339382 terminating
> > 2008-03-21 20:28:05,411 INFO org.apache.hadoop.dfs.DataNode:
> > writeBlock blk_6757062148746339382 received exception
> java.io.EOFException
> > 2008-03-21 20:28:05,411 ERROR org.apache.hadoop.dfs.DataNode:
> > 141.xxx..xxx.xxx:50010:DataXceiver: java.io.EOFException
> >     at java.io.DataInputStream.readInt(Unknown Source)
> >     at
> >
> org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:22
> 63)
> >     at
> >
> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1150)
> >     at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
> >     at java.lang.Thread.run(Unknown Source)
> > 2008-03-21 19:26:46,535 INFO org.apache.hadoop.dfs.DataNode:
> > writeBlock blk_-7369396710977076579 received exception
> > java.net.SocketException: Connection reset
> > 2008-03-21 19:26:46,535 ERROR org.apache.hadoop.dfs.DataNode:
> > 141.xxx.xxx.xxx:50010:DataXceiver: java.net.SocketException:
> > Connection reset
> >     at java.net.SocketInputStream.read(Unknown Source)
> >     at java.io.BufferedInputStream.fill(Unknown Source)
> >     at java.io.BufferedInputStream.read(Unknown Source)
> >     at java.io.DataInputStream.readInt(Unknown Source)
> >     at
> >
> org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:22
> 63)
> >     at
> >
> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1150)
> >     at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
> >     at java.lang.Thread.run(Unknown Source)
> I'm running Hadoop 0.16.1 - Has anyone made the same or a similar
> experience.
> How can the performance degradation be avoided? More datanodes? Why
> seems the block deletion not to catch up with the deletion of the file?
> Thanks in advance for your insights, ideas & suggestions :-)
> 
> Cu on the 'net,
>                         Bye - bye,
> 
>                                    <<<<< André <<<< >>>>
èrbnA >>>>>




Mime
View raw message