Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 38037 invoked from network); 28 Mar 2007 05:14:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Mar 2007 05:14:54 -0000 Received: (qmail 92158 invoked by uid 500); 28 Mar 2007 05:15:01 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 91885 invoked by uid 500); 28 Mar 2007 05:15:00 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 91872 invoked by uid 99); 28 Mar 2007 05:15:00 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Mar 2007 22:15:00 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Mar 2007 22:14:52 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 5CDE8714064 for ; Tue, 27 Mar 2007 22:14:32 -0700 (PDT) Message-ID: <18134522.1175058872378.JavaMail.jira@brutus> Date: Tue, 27 Mar 2007 22:14:32 -0700 (PDT) From: "Hadoop QA (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1170) Very high CPU usage on data nodes because of FSDataset.checkDataDir() on every connect In-Reply-To: <20559471.1175057192137.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484711 ] Hadoop QA commented on HADOOP-1170: ----------------------------------- +1, because http://issues.apache.org/jira/secure/attachment/12354393/1170.patch applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/523072. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch > Very high CPU usage on data nodes because of FSDataset.checkDataDir() on every connect > -------------------------------------------------------------------------------------- > > Key: HADOOP-1170 > URL: https://issues.apache.org/jira/browse/HADOOP-1170 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.11.2 > Reporter: Igor Bolotin > Attachments: 1170.patch > > > While investigating performance issues in our Hadoop DFS/MapReduce cluster I saw very high CPU usage by DataNode processes. > Stack trace showed following on most of the data nodes: > "org.apache.hadoop.dfs.DataNode$DataXceiveServer@528acf6e" daemon prio=1 tid=0x00002aaacb5b7bd0 nid=0x5940 runnable [0x000000004166a000..0x000000004166ac00] > at java.io.UnixFileSystem.checkAccess(Native Method) > at java.io.File.canRead(File.java:660) > at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:34) > at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:164) > at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168) > at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168) > at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168) > at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168) > at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168) > at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168) > at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168) > at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168) > at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168) > at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168) > at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168) > at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168) > at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:168) > at org.apache.hadoop.dfs.FSDataset$FSVolume.checkDirs(FSDataset.java:258) > at org.apache.hadoop.dfs.FSDataset$FSVolumeSet.checkDirs(FSDataset.java:339) > - locked <0x00002aaab6fb8960> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet) > at org.apache.hadoop.dfs.FSDataset.checkDataDir(FSDataset.java:544) > at org.apache.hadoop.dfs.DataNode$DataXceiveServer.run(DataNode.java:535) > at java.lang.Thread.run(Thread.java:595) > I understand that it would take a while to check the entire data directory - as we have some 180,000 blocks/files in there. But what really bothers me that from the code I see that this check is executed for every client connection to the DataNode - which also means for every task executed in the cluster. Once I commented out the check and restarted datanodes - the performance went up and CPU usage went down to reasonable level. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.