Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 46110643D for ; Sat, 11 Jun 2011 18:27:23 +0000 (UTC) Received: (qmail 59637 invoked by uid 500); 11 Jun 2011 18:27:23 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 59594 invoked by uid 500); 11 Jun 2011 18:27:22 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 59586 invoked by uid 99); 11 Jun 2011 18:27:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Jun 2011 18:27:22 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Jun 2011 18:27:19 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id E441B11010B for ; Sat, 11 Jun 2011 18:26:58 +0000 (UTC) Date: Sat, 11 Jun 2011 18:26:58 +0000 (UTC) From: "Jonathan Hsieh (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <488183047.14152.1307816818931.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1176966401.11904.1301078529898.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-1787) "Not enough xcievers" error should propagate to client MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047962#comment-13047962 ] Jonathan Hsieh commented on HDFS-1787: -------------------------------------- {quote} > Text.readString can throw IOException. The InternalDataNodeException thrown on the next line is also a subclass of IOException. Behaviorwise it would essentially use the same error recovery path. However, we will loss the information like socket addresses. {quote} I believe this is already an error path, but I'll look into this more. {quote} Some comments: Please combine them into one message. {code} + DFSClient.LOG.warn("Failed to connect to" + targetAddr +": " + + ex.getMessage()); + DFSClient.LOG.warn(" Adding to deadNodes and continuing"); {code} {quote} My plan is to add \n's to the log message. {quote} {code} It is better to log the exception. + } catch (IOException e) { + // preserve previous semantics, eat the exception. + } {code} {quote} Will add logging. {quote} Do we really need internalDNErrors and getInternalDNErrorCount()? It is only used in the tests. {quote} Can you suggest an alternate mechanism for (automated) testing of the changes other than visual inspection of the logs? This tests that the error messaging path was exercised and actually provides some information that may be useful in trouble shooting. I believe there are annotations in the works that are semantically mean "public for testing but otherwise private/package". I believe the comment I added would make this reasonably easy to find when this gets integrated throughout. > "Not enough xcievers" error should propagate to client > ------------------------------------------------------ > > Key: HDFS-1787 > URL: https://issues.apache.org/jira/browse/HDFS-1787 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node > Affects Versions: 0.23.0 > Reporter: Todd Lipcon > Assignee: Jonathan Hsieh > Labels: newbie > Fix For: 0.23.0 > > Attachments: hdfs-1787.2.patch, hdfs-1787.3.patch, hdfs-1787.3.patch, hdfs-1787.5.patch, hdfs-1787.patch > > > We find that users often run into the default transceiver limits in the DN. Putting aside the inherent issues with xceiver threads, it would be nice if the "xceiver limit exceeded" error propagated to the client. Currently, clients simply see an EOFException which is hard to interpret, and have to go slogging through DN logs to find the underlying issue. > The data transfer protocol should be extended to either have a special error code for "not enough xceivers" or should have some error code for generic errors with which a string can be attached and propagated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira