Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 98E731083F for ; Sat, 15 Mar 2014 08:20:47 +0000 (UTC) Received: (qmail 260 invoked by uid 500); 15 Mar 2014 08:20:46 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 231 invoked by uid 500); 15 Mar 2014 08:20:46 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 191 invoked by uid 99); 15 Mar 2014 08:20:43 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Mar 2014 08:20:43 +0000 Date: Sat, 15 Mar 2014 08:20:43 +0000 (UTC) From: "Colin Patrick McCabe (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936087#comment-13936087 ] Colin Patrick McCabe commented on HDFS-6093: -------------------------------------------- bq. Arpit said: In addition to reducing the timeout as you suggested, can we add some explanation to the command output, or update CentralizedCacheManagement.html in the docs? I agree, we could add a short comment to the docs about this. Now that the timeout has been reduced, there should be much less discrepancy between the output of the two commands, of course. Taking a more detailed look at the patch now. {code} + public FsStatus getCacheStatus() throws IOException { {code} I know it seems clever to reuse the same class for getStatus and getCacheStatus, but it could become a problem if someone later adds more fields to getStatus that don't apply to getCacheStatus. I think we need our own type for this, to maintain sanity in the future. It's not that much code. {code} public long getMissingBlocksCount() throws IOException { + statistics.incrementReadOps(1); return dfs.getMissingBlocksCount(); {code} Can we put the non-caching-related incrementReadOps changes in their own JIRA? It may seem like a trivial change, but it's kind of distracting from this JIRA. Also I'm not sure I understand when we're "supposed" to increment this... {code} /** * Number of replicas pending caching. */ private long numPendingCaching; /** * Number of replicas pending uncaching. */ private long numPendingUncaching; {code} Could use a linebreak after {{numPendingCaching}} for consistency. Like I said earlier, I'd prefer to decouple the counter(s) that can be read from the CRM from the counters that the CRM uses internally during the scan. Using the same variable for both just invites bugs like the one Arpit pointed out, where rescan zeroes the counter outside the lock. {code} [CacheManager#processCacheReportImpl changes] {code} Incrementally updating the pendingUncached list and stats is a nice idea, but it seems too ambitious for 2.4 at this point. Now that the CRM interval is 30 seconds, it shouldn't be too bad to just wait for the CRM to update its stats and the lists. Additionally, we don't even know that monitor is non-null at this point, so there is an NPE here, I think. Let's leave this out and revisit it later. > Expose more caching information for debugging by users > ------------------------------------------------------ > > Key: HDFS-6093 > URL: https://issues.apache.org/jira/browse/HDFS-6093 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching > Affects Versions: 2.4.0 > Reporter: Andrew Wang > Assignee: Andrew Wang > Attachments: hdfs-6093-1.patch > > > When users submit a new cache directive, it's unclear if the NN has recognized it and is actively trying to cache it, or if it's hung for some other reason. It'd be nice to expose a "pending caching/uncaching" count the same way we expose pending replication work. > It'd also be nice to display the aggregate cache capacity and usage in dfsadmin -report, since we already have have it as a metric and expose it per-DN in report output. -- This message was sent by Atlassian JIRA (v6.2#6252)