Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DE50011096 for ; Sat, 14 Jun 2014 23:02:02 +0000 (UTC) Received: (qmail 79805 invoked by uid 500); 14 Jun 2014 23:02:02 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 79748 invoked by uid 500); 14 Jun 2014 23:02:01 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 79736 invoked by uid 99); 14 Jun 2014 23:02:01 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Jun 2014 23:02:01 +0000 Date: Sat, 14 Jun 2014 23:02:01 +0000 (UTC) From: "Lars Hofhansl (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-11322) SnapshotHFileCleaner makes the wrong check for lastModified time thus causing too many cache refreshes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-11322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031729#comment-14031729 ] Lars Hofhansl commented on HBASE-11322: --------------------------------------- Hmm... True, assuming no races and or clock skew in any way you are correct. I agree Math.max is fine too. Either way is fine with me. Keeping both times is more explicit and easier to reason about. Math.max on the other hand is simpler. +1 on Math.max if you like that better :) > SnapshotHFileCleaner makes the wrong check for lastModified time thus causing too many cache refreshes > ------------------------------------------------------------------------------------------------------ > > Key: HBASE-11322 > URL: https://issues.apache.org/jira/browse/HBASE-11322 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.19 > Reporter: churro morales > Assignee: churro morales > Priority: Critical > Fix For: 0.94.21 > > Attachments: 11322.94.txt, HBASE-11322.patch > > > The SnapshotHFileCleaner calls the SnapshotFileCache if a particular HFile in question is part of a snapshot. > If the HFile is not in the cache, we then refresh the cache and check again. > But the cache refresh checks to see if anything has been modified since the last cache refresh but this logic is incorrect in certain scenarios. > The last modified time is done via this operation: > {code} > this.lastModifiedTime = Math.min(dirStatus.getModificationTime(), > tempStatus.getModificationTime()); > {code} > and the check to see if the snapshot directories have been modified: > {code} > // if the snapshot directory wasn't modified since we last check, we are done > if (dirStatus.getModificationTime() <= lastModifiedTime && > tempStatus.getModificationTime() <= lastModifiedTime) { > return; > } > {code} > Suppose the following happens: > dirStatus modified 6-1-2014 > tempStatus modified 6-2-2014 > lastModifiedTime = 6-1-2014 > provided these two directories don't get modified again all subsequent checks wont exit early, like they should. > In our cluster, this was a huge performance hit. The cleaner chain fell behind, thus almost filling up dfs and our namenode heap. > Its a simple fix, instead of Math.min we use Math.max for the lastModified, I believe that will be correct. > I'll apply a patch for you guys. -- This message was sent by Atlassian JIRA (v6.2#6252)