Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C49A319D56 for ; Wed, 13 Apr 2016 11:44:26 +0000 (UTC) Received: (qmail 86952 invoked by uid 500); 13 Apr 2016 11:44:26 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 86731 invoked by uid 500); 13 Apr 2016 11:44:26 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 86672 invoked by uid 99); 13 Apr 2016 11:44:26 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Apr 2016 11:44:26 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id BEF592C1F6B for ; Wed, 13 Apr 2016 11:44:25 +0000 (UTC) Date: Wed, 13 Apr 2016 11:44:25 +0000 (UTC) From: "Walter Su (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239103#comment-15239103 ] Walter Su commented on HDFS-9412: --------------------------------- One thread holding a readLock too long is very like holding a writeLock. We should avoid that. And after HDFS-8824, the small blocks are unused anyway, so there's no point to send them to balancer. Hi, [~He Tianyi], Do you mind rebase the patch? > getBlocks occupies FSLock and takes too long to complete > -------------------------------------------------------- > > Key: HDFS-9412 > URL: https://issues.apache.org/jira/browse/HDFS-9412 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: He Tianyi > Assignee: He Tianyi > Attachments: HDFS-9412.0000.patch > > > {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a long time to complete (probably several seconds, if number of blocks are too much). > During this period, other threads attempting to acquire write lock will wait. > In an extreme case, RPC handlers are occupied by one reader thread calling {{getBlocks}} and all other threads waiting for write lock, rpc server acts like hung. Unfortunately, this tends to happen in heavy loaded cluster, since read operations come and go fast (they do not need to wait), leaving write operations waiting. > Looks like we can optimize this thing like DN block report did in past, by splitting the operation into smaller sub operations, and let other threads do their work between each sub operation. The whole result is returned at once, though (one thing different from DN block report). > I am not sure whether this will work. Any better idea? -- This message was sent by Atlassian JIRA (v6.3.4#6332)