From core-dev-return-32267-apmail-hadoop-core-dev-archive=hadoop.apache.org@hadoop.apache.org Wed Mar 05 16:02:58 2008 Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 26228 invoked from network); 5 Mar 2008 16:02:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Mar 2008 16:02:57 -0000 Received: (qmail 87330 invoked by uid 500); 5 Mar 2008 16:02:51 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 87292 invoked by uid 500); 5 Mar 2008 16:02:50 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 87269 invoked by uid 99); 5 Mar 2008 16:02:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Mar 2008 08:02:50 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Mar 2008 16:02:22 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 3B44F234C099 for ; Wed, 5 Mar 2008 08:01:41 -0800 (PST) Message-ID: <1345109682.1204732901241.JavaMail.jira@brutus> Date: Wed, 5 Mar 2008 08:01:41 -0800 (PST) From: "Hudson (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-2883) Extensive write failures In-Reply-To: <1115245809.1203711319568.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12575396#action_12575396 ] Hudson commented on HADOOP-2883: -------------------------------- Integrated in Hadoop-trunk #420 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/420/]) > Extensive write failures > ------------------------ > > Key: HADOOP-2883 > URL: https://issues.apache.org/jira/browse/HADOOP-2883 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > Assignee: dhruba borthakur > Priority: Blocker > Fix For: 0.16.1 > > Attachments: packetResponse.patch, packetResponse_0.16.patch > > > With the new release 0.16.0 we experience extensive write failures under heavy load. > The job shuffles 300TB on 1400 nodes and runs 3 waves of 2500 reducers. Each reducer uses libhdfs to write to around 70 dfs files simultaneously. We did not experience particular write problems up to nightly build #835 on hadoopqa (Jan 28), > but now with released 0.16.0 (candidate 2) we see a lot of exceptions related to 'all datanodes are bad': > typical exception(s): > 08/02/22 10:34:47 WARN fs.DFSClient: Error Recovery for block blk_434406883423887779 in pipeline xxx.xxx.xxx.146:50010, xxx.xxx.xxx.224:50010: bad datanode xxx.xxx.xxx.146:50010 > 08/02/22 10:34:51 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:34:51 WARN fs.DFSClient: Error Recovery for block blk_-1957866292089920792 in pipeline xxx.xxx.xxx.147:50010, xxx.xxx.xxx.10:50010: bad datanode xxx.xxx.xxx.147:50010 > 08/02/22 10:34:54 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:34:54 WARN fs.DFSClient: Error Recovery for block blk_-5265240773298481019 in pipeline xxx.xxx.xxx.152:50010, xxx.xxx.xxx.71:50010: bad datanode xxx.xxx.xxx.152:50010 > 08/02/22 10:34:54 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:34:54 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed outxxx.xxx.xxx.166:50010 > 08/02/22 10:34:55 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:00 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:00 WARN fs.DFSClient: Error Recovery for block blk_8456718220685890569 in pipeline xxx.xxx.xxx.158:50010, xxx.xxx.xxx.225:50010: bad datanode xxx.xxx.xxx.158:50010 > 08/02/22 10:35:00 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:00 WARN fs.DFSClient: Error Recovery for block blk_1420965154382429572 in pipeline xxx.xxx.xxx.169:50010, xxx.xxx.xxx.221:50010: bad datanode xxx.xxx.xxx.169:50010 > 08/02/22 10:35:00 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:00 WARN fs.DFSClient: Error Recovery for block blk_-519424763987472708 in pipeline xxx.xxx.xxx.154:50010, xxx.xxx.xxx.37:50010: bad datanode xxx.xxx.xxx.154:50010 > 08/02/22 10:35:00 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:00 WARN fs.DFSClient: Error Recovery for block blk_-8376556524788296783 in pipeline xxx.xxx.xxx.154:50010, xxx.xxx.xxx.212:50010: bad datanode xxx.xxx.xxx.154:50010 > 08/02/22 10:35:00 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:00 WARN fs.DFSClient: Error Recovery for block blk_-2429564741658530079 in pipeline xxx.xxx.xxx.160:50010, xxx.xxx.xxx.105:50010: bad datanode xxx.xxx.xxx.160:50010 > 08/02/22 10:35:00 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:00 WARN fs.DFSClient: Error Recovery for block blk_-6653210787685458124 in pipeline xxx.xxx.xxx.143:50010, xxx.xxx.xxx.37:50010: bad datanode xxx.xxx.xxx.143:50010 > 08/02/22 10:35:01 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:01 WARN fs.DFSClient: Error Recovery for block blk_7515160028005424426 in pipeline xxx.xxx.xxx.167:50010, xxx.xxx.xxx.152:50010: bad datanode xxx.xxx.xxx.167:50010 > 08/02/22 10:35:03 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:03 WARN fs.DFSClient: Error Recovery for block blk_-7191475898558388503 in pipeline xxx.xxx.xxx.139:50010, xxx.xxx.xxx.6:50010: bad datanode xxx.xxx.xxx.139:50010 > 08/02/22 10:35:03 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:03 WARN fs.DFSClient: Error Recovery for block blk_-340745015348833165 in pipeline xxx.xxx.xxx.141:50010, xxx.xxx.xxx.153:50010: bad datanode xxx.xxx.xxx.141:50010 > 08/02/22 10:35:04 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:04 WARN fs.DFSClient: Error Recovery for block blk_-6861254790596076341 in pipeline xxx.xxx.xxx.157:50010, xxx.xxx.xxx.224:50010: bad datanode xxx.xxx.xxx.157:50010 > 08/02/22 10:35:14 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:14 INFO fs.DFSClient: Abandoning block blk_6188945400680100475 > 08/02/22 10:35:14 INFO fs.DFSClient: Waiting to find target node: xxx.xxx.xxx.161:50010 > 08/02/22 10:35:43 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:47 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:48 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:49 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:49 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:50 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:50 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:50 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:53 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:54 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:57 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:35:57 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:36:03 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:36:03 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:36:03 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:36:03 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:36:03 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:36:03 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:36:04 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:36:06 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:36:06 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > 08/02/22 10:36:07 INFO fs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out > Exception in thread "main" java.io.IOException: All datanodes xxx.xxx.xxx.83:50010 are bad. Aborting... > at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:1839) > at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1100(DFSClient.java:1479) > at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1571) > Call to org.apache.hadoop.fs.FSDataOutputStream::write failed! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.