hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: Very high CPU usage on data nodes because of FSDataset.checkDataDir() on every connect
Date Wed, 28 Mar 2007 02:58:12 GMT
We should file a Jira on it.

I agree CheckDir() is called too many times and too expensive. I don't 
think it serves anything important or essential purpose. I vote for 
removing it. Anyone ever seen this check fail and/or the failure being 
useful for cluster functionality?

Raghu.

Igor Bolotin wrote:
> While investigating performance issues in our Hadoop DFS/MapReduce
> cluster I saw very high CPU usage by DataNode processes.
> 
> Stack trace showed following on most of the data nodes:
> 
>  
> 
> "org.apache.hadoop.dfs.DataNode$DataXceiveServer@528acf6e" daemon prio=1
> tid=0x00002aaacb5b7bd0 nid=0x5940 runnable
> [0x000000004166a000..0x000000004166ac00]
> 
>         at java.io.UnixFileSystem.checkAccess(Native Method)
> 
>         at java.io.File.canRead(File.java:660)
> 
>         at
> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:34)
> 
>         at
> org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:164)
> 
>         at
> org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
> 
>         at
> org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
> 
>         at
> org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
> 
>         at
> org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
> 
>         at
> org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
> 
>         at
> org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
> 
>         at
> org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
> 
>         at
> org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
> 
>         at
> org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
> 
>         at
> org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
> 
>         at
> org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
> 
>         at
> org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
> 
>         at
> org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
> 
>         at
> org.apache.hadoop.dfs.FSDataset$FSVolume.checkDirs(FSDataset.java:258)
> 
>         at
> org.apache.hadoop.dfs.FSDataset$FSVolumeSet.checkDirs(FSDataset.java:339
> )
> 
>         - locked <0x00002aaab6fb8960> (a
> org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
> 
>         at
> org.apache.hadoop.dfs.FSDataset.checkDataDir(FSDataset.java:544)
> 
>         at
> org.apache.hadoop.dfs.DataNode$DataXceiveServer.run(DataNode.java:535)
> 
>         at java.lang.Thread.run(Thread.java:595)
> 
>  
> 
> I understand that it would take a while to check the entire data
> directory - as we have some 180,000 blocks/files in there. But what
> really bothers me that from the code I see that this check is executed
> for every client connection to the DataNode - which also means for every
> task executed in the cluster. Once I commented out the check and
> restarted datanodes - the performance went up and CPU usage went down to
> reasonable level. 
> 
>  
> 
> Now the question is - am I missing something here or this check should
> really be removed? 
> 
>  
> 
> Best regards,
> 
> Igor Bolotin
> www.collarity.com

Mime
View raw message