hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <...@yahoo-inc.com>
Subject Re: 0.18.1 datanode psuedo deadlock problem
Date Fri, 09 Jan 2009 19:46:18 GMT
Hi Jason,

2 million blocks per data-node is not going to work.
There were discussions about it previously, please
check the mail archives.

This means you have a lot of very small files, which
HDFS is not designed to support. A general recommendation
is to group small files into large ones, introducing
some kind of record structure delimiting those small files,
and control it in on the application level.

Thanks,
--Konstantin


Jason Venner wrote:
> The problem we are having is that datanodes periodically stall for 10-15 
> minutes and drop off the active list and then come back.
> 
> What is going on is that a long operation set is holding the lock on on 
> FSDataset.volumes, and all of the other block service requests stall 
> behind this lock.
> 
> "DataNode: [/data/dfs-video-18/dfs/data]" daemon prio=10 tid=0x4d7ad400 
> nid=0x7c40 runnable [0x4c698000..0x4c6990d0]
>   java.lang.Thread.State: RUNNABLE
>    at java.lang.String.lastIndexOf(String.java:1628)
>    at java.io.File.getName(File.java:399)
>    at 
> org.apache.hadoop.dfs.FSDataset$FSDir.getGenerationStampFromFile(FSDataset.java:148)

> 
>    at 
> org.apache.hadoop.dfs.FSDataset$FSDir.getBlockInfo(FSDataset.java:181)
>    at 
> org.apache.hadoop.dfs.FSDataset$FSVolume.getBlockInfo(FSDataset.java:412)
>    at 
> org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getBlockInfo(FSDataset.java:511) 
> 
>    - locked <0x551e8d48> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
>    at org.apache.hadoop.dfs.FSDataset.getBlockReport(FSDataset.java:1053)
>    at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:708)
>    at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2890)
>    at java.lang.Thread.run(Thread.java:619)
> 
> This is basically taking a stat on every hdfs block on the datanode, 
> which in our case is ~ 2million, and can take 10+ minutes (we may be 
> experiencing problems with our raid controller but have no visibility 
> into it) at the OS level the file system seems fine and operations 
> eventually finish.
> 
> It appears that a couple of different data structures are being locked 
> with the single object FSDataset$Volume.
> 
> Then this happens:
> "org.apache.hadoop.dfs.DataNode$DataXceiver@1bcee17" daemon prio=10 
> tid=0x4da8d000 nid=0x7ae4 waiting for monitor entry 
> [0x459fe000..0x459ff0d0]
>   java.lang.Thread.State: BLOCKED (on object monitor)
>    at 
> org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getNextVolume(FSDataset.java:473) 
> 
>    - waiting to lock <0x551e8d48> (a 
> org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
>    at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:934)
>    - locked <0x54e550e0> (a org.apache.hadoop.dfs.FSDataset)
>    at 
> org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:2322)
>    at 
> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1187)
>    at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1045)
>    at java.lang.Thread.run(Thread.java:619)
> 
> which locks the FSDataset while waiting on the volume object
> 
> and now all of the Datanode operations stall waiting on the FSDataset 
> object.
> ----------
> 
> Our particular installation doesn't use multiple directories for hdfs, 
> so a first simple hack for a local fix would be to modify getNextVolume 
> to just return the single volume and not be synchronized
> 
> A richer alternative would be to make the locking more fine grained on 
> FSDataset$FSVolumeSet.
> 
> Of course we are also trying to fix the file system performance and dfs 
> block loading that results in the block report taking a long time.
> 
> Any suggestions or warnings?
> 
> Thanks.
> 
> 
> 

Mime
View raw message