hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Venner <ja...@attributor.com>
Subject Re: 0.18.1 datanode psuedo deadlock problem
Date Sat, 10 Jan 2009 04:59:00 GMT
I propose an alternate solution for this.
If the block information was managed by having a inotify task (in 
linux/solaris), and the windows equivalent which I forget, the datanode 
could be informed each time a file in the dfs tree is created, updated, 
or deleted.

With this information being delivered, it can maintain an accurate block 
map with only 1 full scan of the datanode blocks, at start time.

With this algorithm the data nodes will be able to scale to a much 
larger number of blocks.

The other thing is the way the sync blocks on the FSDataset.FSVolumeSet 
are held totally aggravates this bug in 0.18.1.

I have implemented a pure java version of inotify, using JNA 
(https://jna.dev.java.net/) and there is a windows version also 
available, or some simple jni could be written.

The jason@attributor.com address will be going away shortly, I will be 
switching to jason.hadoop@gmail.com in the next little bit.



Jason Venner wrote:
> The problem we are having is that datanodes periodically stall for 
> 10-15 minutes and drop off the active list and then come back.
>
> What is going on is that a long operation set is holding the lock on 
> on FSDataset.volumes, and all of the other block service requests 
> stall behind this lock.
>
> "DataNode: [/data/dfs-video-18/dfs/data]" daemon prio=10 
> tid=0x4d7ad400 nid=0x7c40 runnable [0x4c698000..0x4c6990d0]
>   java.lang.Thread.State: RUNNABLE
>    at java.lang.String.lastIndexOf(String.java:1628)
>    at java.io.File.getName(File.java:399)
>    at 
> org.apache.hadoop.dfs.FSDataset$FSDir.getGenerationStampFromFile(FSDataset.java:148)

>
>    at 
> org.apache.hadoop.dfs.FSDataset$FSDir.getBlockInfo(FSDataset.java:181)
>    at 
> org.apache.hadoop.dfs.FSDataset$FSVolume.getBlockInfo(FSDataset.java:412)
>    at 
> org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getBlockInfo(FSDataset.java:511) 
>
>    - locked <0x551e8d48> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
>    at org.apache.hadoop.dfs.FSDataset.getBlockReport(FSDataset.java:1053)
>    at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:708)
>    at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2890)
>    at java.lang.Thread.run(Thread.java:619)
>
> This is basically taking a stat on every hdfs block on the datanode, 
> which in our case is ~ 2million, and can take 10+ minutes (we may be 
> experiencing problems with our raid controller but have no visibility 
> into it) at the OS level the file system seems fine and operations 
> eventually finish.
>
> It appears that a couple of different data structures are being locked 
> with the single object FSDataset$Volume.
>
> Then this happens:
> "org.apache.hadoop.dfs.DataNode$DataXceiver@1bcee17" daemon prio=10 
> tid=0x4da8d000 nid=0x7ae4 waiting for monitor entry 
> [0x459fe000..0x459ff0d0]
>   java.lang.Thread.State: BLOCKED (on object monitor)
>    at 
> org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getNextVolume(FSDataset.java:473) 
>
>    - waiting to lock <0x551e8d48> (a 
> org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
>    at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:934)
>    - locked <0x54e550e0> (a org.apache.hadoop.dfs.FSDataset)
>    at 
> org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:2322)
>    at 
> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1187)
>    at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1045)
>    at java.lang.Thread.run(Thread.java:619)
>
> which locks the FSDataset while waiting on the volume object
>
> and now all of the Datanode operations stall waiting on the FSDataset 
> object.
> ----------
>
> Our particular installation doesn't use multiple directories for hdfs, 
> so a first simple hack for a local fix would be to modify 
> getNextVolume to just return the single volume and not be synchronized
>
> A richer alternative would be to make the locking more fine grained on 
> FSDataset$FSVolumeSet.
>
> Of course we are also trying to fix the file system performance and 
> dfs block loading that results in the block report taking a long time.
>
> Any suggestions or warnings?
>
> Thanks.
>
>
>

Mime
View raw message