hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igor Bolotin (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1170) Very high CPU usage on data nodes because of FSDataset.checkDataDir() on every connect
Date Wed, 28 Mar 2007 04:46:32 GMT
Very high CPU usage on data nodes because of FSDataset.checkDataDir() on every connect
--------------------------------------------------------------------------------------

                 Key: HADOOP-1170
                 URL: https://issues.apache.org/jira/browse/HADOOP-1170
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.11.2
            Reporter: Igor Bolotin


While investigating performance issues in our Hadoop DFS/MapReduce cluster I saw very high
CPU usage by DataNode processes.

Stack trace showed following on most of the data nodes:
"org.apache.hadoop.dfs.DataNode$DataXceiveServer@528acf6e" daemon prio=1 tid=0x00002aaacb5b7bd0
nid=0x5940 runnable [0x000000004166a000..0x000000004166ac00]
        at java.io.UnixFileSystem.checkAccess(Native Method)
        at java.io.File.canRead(File.java:660)
        at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:34)
        at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:164)
        at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
        at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
        at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
        at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
        at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
        at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
        at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
        at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
        at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
        at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
        at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
        at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
        at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168)
        at org.apache.hadoop.dfs.FSDataset$FSVolume.checkDirs(FSDataset.java:258)
        at org.apache.hadoop.dfs.FSDataset$FSVolumeSet.checkDirs(FSDataset.java:339)
        - locked <0x00002aaab6fb8960> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
        at org.apache.hadoop.dfs.FSDataset.checkDataDir(FSDataset.java:544)
        at org.apache.hadoop.dfs.DataNode$DataXceiveServer.run(DataNode.java:535)
        at java.lang.Thread.run(Thread.java:595)

I understand that it would take a while to check the entire data directory - as we have some
180,000 blocks/files in there. But what really bothers me that from the code I see that this
check is executed for every client connection to the DataNode - which also means for every
task executed in the cluster. Once I commented out the check and restarted datanodes - the
performance went up and CPU usage went down to reasonable level.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message