Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 15624 invoked from network); 28 Jul 2008 19:46:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Jul 2008 19:46:24 -0000 Received: (qmail 95929 invoked by uid 500); 28 Jul 2008 19:46:22 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 95899 invoked by uid 500); 28 Jul 2008 19:46:21 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 95888 invoked by uid 99); 28 Jul 2008 19:46:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jul 2008 12:46:21 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jul 2008 19:45:35 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 9D2CA234C173 for ; Mon, 28 Jul 2008 12:45:31 -0700 (PDT) Message-ID: <1279838139.1217274331638.JavaMail.jira@brutus> Date: Mon, 28 Jul 2008 12:45:31 -0700 (PDT) From: "Raghu Angadi (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Updated: (HADOOP-3831) slow-reading dfs clients do not recover from datanode-write-timeouts In-Reply-To: <1394849750.1217002771659.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated HADOOP-3831: --------------------------------- Attachment: HADOOP-3831.patch > slow-reading dfs clients do not recover from datanode-write-timeouts > -------------------------------------------------------------------- > > Key: HADOOP-3831 > URL: https://issues.apache.org/jira/browse/HADOOP-3831 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.17.1 > Reporter: Christian Kunz > Assignee: Raghu Angadi > Attachments: HADOOP-3831.patch, HADOOP-3831.patch > > > Some of our applications read through certain files from dfs (using libhdfs) much slower than through others, such that they trigger the write timeout introduced in 0.17.x into the datanodes. Eventually they fail. > Dfs clients should be able to recover from such a situation. > In the meantime, would setting > dfs.datanode.socket.write.timeout=0 > in hadoop-site.xml help? > Here are the exceptions I see: > DataNode: > 2008-07-24 00:12:40,167 WARN org.apache.hadoop.dfs.DataNode: xxx:50010:Got exception while serving blk_3304550638094049 > 753 to /yyy: > java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels. > SocketChannel[connected local=/xxx:50010 remote=/yyy:42542] > at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:170) > at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144) > at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) > at java.io.DataOutputStream.write(DataOutputStream.java:90) > at org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774) > at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813) > at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039) > at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968) > at java.lang.Thread.run(Thread.java:619) > DFS Client: > 08/07/24 00:13:28 WARN dfs.DFSClient: Exception while reading from blk_3304550638094049753 of zzz from xxx:50010: java.io.IOException: Premeture EOF from inputStream > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100) > at org.apache.hadoop.dfs.DFSClient$BlockReader.readChunk(DFSClient.java:967) > at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:236) > at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:191) > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159) > at org.apache.hadoop.dfs.DFSClient$BlockReader.read(DFSClient.java:829) > at org.apache.hadoop.dfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1352) > at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1388) > at java.io.DataInputStream.read(DataInputStream.java:83) > 08/07/24 00:13:28 INFO dfs.DFSClient: Could not obtain block blk_3304550638094049753 from any node: java.io.IOException: No live nodes contain current block -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.