Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1B3A5FFA4 for ; Thu, 4 Apr 2013 17:26:27 +0000 (UTC) Received: (qmail 67799 invoked by uid 500); 4 Apr 2013 17:26:26 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 67709 invoked by uid 500); 4 Apr 2013 17:26:25 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 67661 invoked by uid 99); 4 Apr 2013 17:26:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 17:26:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cnauroth@hortonworks.com designates 209.85.128.174 as permitted sender) Received: from [209.85.128.174] (HELO mail-ve0-f174.google.com) (209.85.128.174) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 17:26:18 +0000 Received: by mail-ve0-f174.google.com with SMTP id jz10so2798722veb.19 for ; Thu, 04 Apr 2013 10:25:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=MRFZB8EdUi+XM9cGVLPVVOk/MSRCPSpsg7pu/rGlGow=; b=dRZBHJ9q3Y13T9gRuDgrhTaFFpjws383guoBS/qK6uk+L69XsYRD/yCSvQEbOt4eQs InPzYOeJJBUyIKTRsCoYq4UpOdycieHlcfVr0grGH0FkJ8xK2eOykdIpbp7OcXhWkYwS WcJCoYxTu7BGWjzOAZOTSToLrfZBHUfUwSzaWTyvItA8K8vR9qgvVfDCOedFTFKWy2Xs udUt17SiI7NlFE1dgiYUFpxCs8+AJZM/3/QkSl+l8v+v8aRcTJ3vdicDZyjTJwahNyoX FkrV2wQEwn3oQF/AhJkQxVs08x6gryJExEUUhKCmDvXZqZvgaDMouHoLTSjIZUy4D4rR MYcQ== MIME-Version: 1.0 X-Received: by 10.52.163.167 with SMTP id yj7mr4643985vdb.25.1365096357747; Thu, 04 Apr 2013 10:25:57 -0700 (PDT) Received: by 10.220.122.70 with HTTP; Thu, 4 Apr 2013 10:25:57 -0700 (PDT) In-Reply-To: References: Date: Thu, 4 Apr 2013 10:25:57 -0700 Message-ID: Subject: Re: Max DataXceiver Exceeded Logs in DataNode log files From: Chris Nauroth To: hdfs-dev@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c228be7d058a04d98c435b X-Gm-Message-State: ALoCoQkcixNsbZego1IszqecJW21GuIKwtS9yGf7EoSAPFL/rb1vhaErKiBpqkAmpfrZ4QR1jNcA X-Virus-Checked: Checked by ClamAV on apache.org --001a11c228be7d058a04d98c435b Content-Type: text/plain; charset=ISO-8859-1 Hello Sreekanth, The threads that you see named "PacketResponder" are in fact the threads allocated from the BlockReceiver class. As you noticed, they are placed into the same thread group as the threads named "DataXceiver" allocated from the DataXceiverServer class. The current count is determined by the activeCount of the thread group, and the sum of those threads (2796 + 1336) has exceeded the configured max of 4096. (Note that ThreadGroup.activeCount is not precise, so in this example, the total has gone a bit above 4096.) This is expected behavior. The "DataXceiver" and "PacketResponder" threads are designed to work together as a pair: a receiver and a responder. There is no way to configure the limits separately for "DataXceiver" vs. "PacketResponder". Since the 2 threads need to cooperate as a pair, I don't think we'd want to provide the capability to configure the limits independently either. (This could potentially cause confusing error scenarios where the 2 are configured with different values for the limits, and you have capacity for one kind of thread but not the other.) The only way to tune the max thread count is the max xceiver configuration parameter that you are already using, and you can think of this as the thread pool size for the whole thing. I hope this helps. Thanks, --Chris On Thu, Apr 4, 2013 at 4:28 AM, Sreekanth Ramakrishnan < sreekanth.ramakrishnan@inmobi.com> wrote: > Posting my previous mail from issues mailing list to dev mailing list. > > Hi All, > > We are currently running Hadoop 0.20.X version of hadoop cluster in our > environment. We have been recently observing slow down of datanodes and > DFSClient times out. Looking at the logs in the data nodes we noticed that > there were quite a bit of Max DataXceiver exceeded exception messages of > following format. > > java.io.IOException: xceiverCount 4114 exceeds the limit of concurrent > xcievers 4096 > > Our cluster configuration allows max of 4096 DataXceiver. And due to this > exception our dfs clients are getting blocked slowing down DFS Performance > from Client prespective. > > When JStack of the datanode process was checked, it showed that out of > 4166 Active threads in the JVM 1336 threads were of DataXceiver. 2796 > threads were PacketResponder threads. Shouldn't DataNode spawn 2760 more > DataXceiver before throwing the IOException? > > Also looking at the code, it seems that we are not setting different > thread group for BlockReceiver which causes the thread pool to be split > between BlockReceiver and DataXceiver. Is this intentional? > > Are there are any work arounds to see to that max allocation of threads > are allocated to DataXceiver? > > Or should I go ahead and file a JIRA regarding this issue? > > Sreekanth Ramakrishnan > > > -- > _____________________________________________________________ > The information contained in this communication is intended solely for the > use of the individual or entity to whom it is addressed and others > authorized to receive it. It may contain confidential or legally privileged > information. If you are not the intended recipient you are hereby notified > that any disclosure, copying, distribution or taking any action in reliance > on the contents of this information is strictly prohibited and may be > unlawful. If you have received this communication in error, please notify > us immediately by responding to this email and then delete it from your > system. The firm is neither liable for the proper and complete transmission > of the information contained in this communication nor for any delay in its > receipt. > --001a11c228be7d058a04d98c435b--