Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 30622 invoked from network); 8 Apr 2008 06:18:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Apr 2008 06:18:13 -0000 Received: (qmail 92131 invoked by uid 500); 8 Apr 2008 06:18:10 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 92050 invoked by uid 500); 8 Apr 2008 06:18:09 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 92031 invoked by uid 99); 8 Apr 2008 06:18:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Apr 2008 23:18:09 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Apr 2008 06:17:26 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 9DD0F234C0C4 for ; Mon, 7 Apr 2008 23:15:24 -0700 (PDT) Message-ID: <1664404254.1207635324645.JavaMail.jira@brutus> Date: Mon, 7 Apr 2008 23:15:24 -0700 (PDT) From: "Runping Qi (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-3124) DFS data node should not use hard coded 10 minutes as write timeout. In-Reply-To: <1506253783.1206736104348.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12586669#action_12586669 ] Runping Qi commented on HADOOP-3124: ------------------------------------ there seems to be at least one place where the constant is still used as the timeout value: {code} @@ -848,7 +859,7 @@ /* utility function for sending a respose */ private static void sendResponse(Socket s, short opStatus) throws IOException { DataOutputStream reply = - new DataOutputStream(new SocketOutputStream(s, WRITE_TIMEOUT)); + new DataOutputStream(NetUtils.getOutputStream(s, WRITE_TIMEOUT)); {code} Is this intended? > DFS data node should not use hard coded 10 minutes as write timeout. > -------------------------------------------------------------------- > > Key: HADOOP-3124 > URL: https://issues.apache.org/jira/browse/HADOOP-3124 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.17.0 > Reporter: Runping Qi > Assignee: Raghu Angadi > Attachments: HADOOP-3124.patch > > > This problem happens in 0.17 trunk > I saw reducers waited 10 minutes for writing data to dfs and got timeout. > The client retries again and timeouted after another 19 minutes. > After looking into the code, it seems that the dfs data node uses 10 minutes as timeout for wtiting data into the data node pipeline. > I thing we have three issues: > 1. The 10 minutes timeout value is too big for writing a chunk of data (64K) through the data node pipeline. > 2. The timeout value should not be hard coded. > 3. Different datanodes in a pipeline should use different timeout values for writing to the downstream. > A reasonable one maybe (20 secs * numOfDataNodesInTheDownStreamPipe). > For example, if the replication factor is 3, the client uses 60 secs, the first data node use 40 secs, the second datanode use 20secs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.