hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igor Bolotin" <ig...@collarity.com>
Subject Very high CPU usage on data nodes because of FSDataset.checkDataDir() on every connect
Date Wed, 28 Mar 2007 02:39:00 GMT
While investigating performance issues in our Hadoop DFS/MapReduce
cluster I saw very high CPU usage by DataNode processes.

Stack trace showed following on most of the data nodes:

 

"org.apache.hadoop.dfs.DataNode$DataXceiveServer@528acf6e" daemon prio=1
tid=0x00002aaacb5b7bd0 nid=0x5940 runnable
[0x000000004166a000..0x000000004166ac00]

        at java.io.UnixFileSystem.checkAccess(Native Method)

        at java.io.File.canRead(File.java:660)

        at
org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:34)

        at
org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:164)

        at
org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)

        at
org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)

        at
org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)

        at
org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)

        at
org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)

        at
org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)

        at
org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)

        at
org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)

        at
org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)

        at
org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)

        at
org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)

        at
org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)

        at
org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)

        at
org.apache.hadoop.dfs.FSDataset$FSVolume.checkDirs(FSDataset.java:258)

        at
org.apache.hadoop.dfs.FSDataset$FSVolumeSet.checkDirs(FSDataset.java:339
)

        - locked <0x00002aaab6fb8960> (a
org.apache.hadoop.dfs.FSDataset$FSVolumeSet)

        at
org.apache.hadoop.dfs.FSDataset.checkDataDir(FSDataset.java:544)

        at
org.apache.hadoop.dfs.DataNode$DataXceiveServer.run(DataNode.java:535)

        at java.lang.Thread.run(Thread.java:595)

 

I understand that it would take a while to check the entire data
directory - as we have some 180,000 blocks/files in there. But what
really bothers me that from the code I see that this check is executed
for every client connection to the DataNode - which also means for every
task executed in the cluster. Once I commented out the check and
restarted datanodes - the performance went up and CPU usage went down to
reasonable level. 

 

Now the question is - am I missing something here or this check should
really be removed? 

 

Best regards,

Igor Bolotin
www.collarity.com

 

 

 

 

 

 

 

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message