Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1F6ED7449 for ; Wed, 20 Jul 2011 12:20:24 +0000 (UTC) Received: (qmail 46253 invoked by uid 500); 20 Jul 2011 12:20:24 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 46177 invoked by uid 500); 20 Jul 2011 12:20:22 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 46169 invoked by uid 99); 20 Jul 2011 12:20:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jul 2011 12:20:22 +0000 X-ASF-Spam-Status: No, hits=-1995.3 required=5.0 tests=ALL_TRUSTED,FRT_ROLEX,FS_REPLICA,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jul 2011 12:20:19 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id E39D64AAE3 for ; Wed, 20 Jul 2011 12:19:57 +0000 (UTC) Date: Wed, 20 Jul 2011 12:19:57 +0000 (UTC) From: "Hudson (JIRA)" To: issues@hbase.apache.org Message-ID: <543997571.7496.1311164397929.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <889556947.13108.1310644559909.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-4095) Hlog may not be rolled in a long time if checkLowReplication's request of LogRoll is blocked MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068314#comment-13068314 ] Hudson commented on HBASE-4095: ------------------------------- Integrated in HBase-TRUNK #2044 (See [https://builds.apache.org/job/HBase-TRUNK/2044/]) HBASE-4095 revert, wait for further investigation HBASE-4095 revert to previous way of initializing initialReplication tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java > Hlog may not be rolled in a long time if checkLowReplication's request of LogRoll is blocked > -------------------------------------------------------------------------------------------- > > Key: HBASE-4095 > URL: https://issues.apache.org/jira/browse/HBASE-4095 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.90.3 > Reporter: Jieshan Bean > Assignee: Jieshan Bean > Attachments: HBASE-4095-90-v2.patch, HBASE-4095-90.patch, HBASE-4095-trunk-v2.patch, HBASE-4095-trunk.patch, HlogFileIsVeryLarge.gif > > > Some large Hlog files(Larger than 10G) appeared in our environment, and I got the reason why they got so huge: > 1. The replicas is less than the expect number. So the method of checkLowReplication will be called each sync. > 2. The method checkLowReplication request log-roll first, and set logRollRequested as true: > {noformat} > private void checkLowReplication() { > // if the number of replicas in HDFS has fallen below the initial > // value, then roll logs. > try { > int numCurrentReplicas = getLogReplication(); > if (numCurrentReplicas != 0 && > numCurrentReplicas < this.initialReplication) { > LOG.warn("HDFS pipeline error detected. " + > "Found " + numCurrentReplicas + " replicas but expecting " + > this.initialReplication + " replicas. " + > " Requesting close of hlog."); > requestLogRoll(); > logRollRequested = true; > } > } catch (Exception e) { > LOG.warn("Unable to invoke DFSOutputStream.getNumCurrentReplicas" + e + > " still proceeding ahead..."); > } > } > {noformat} > 3.requestLogRoll() just commit the roll request. It may not execute in time, for it must got the un-fair lock of cacheFlushLock. > But the lock may be carried by the cacheflush threads. > 4.logRollRequested was true until the log-roll executed. So during the time, each request of log-roll in sync() was skipped. > Here's the logs while the problem happened(Please notice the file size of hlog "193-195-5-111%3A20020.1309937386639" in the last row): > 2011-07-06 15:28:59,284 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline error detected. Found 2 replicas but expecting 3 replicas. Requesting close of hlog. > 2011-07-06 15:29:46,714 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937339119, entries=32434, filesize=239589754. New hlog /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937386639 > 2011-07-06 15:29:56,929 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline error detected. Found 2 replicas but expecting 3 replicas. Requesting close of hlog. > 2011-07-06 15:29:56,933 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://193.195.5.112:9000/hbase/Htable_UFDR_034/a3780cf0c909d8cf8f8ed618b290cc95/.tmp/4656903854447026847 to hdfs://193.195.5.112:9000/hbase/Htable_UFDR_034/a3780cf0c909d8cf8f8ed618b290cc95/value/8603005630220380983 > 2011-07-06 15:29:57,391 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://193.195.5.112:9000/hbase/Htable_UFDR_034/a3780cf0c909d8cf8f8ed618b290cc95/value/8603005630220380983, entries=445880, sequenceid=248900, memsize=207.5m, filesize=130.1m > 2011-07-06 15:29:57,478 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~207.5m for region Htable_UFDR_034,07664,1309936974158.a3780cf0c909d8cf8f8ed618b290cc95. in 10839ms, sequenceid=248900, compaction requested=false > 2011-07-06 15:28:59,236 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309926531955, entries=216459, filesize=2370387468. New hlog /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937339119 > 2011-07-06 15:29:46,714 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937339119, entries=32434, filesize=239589754. New hlog /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937386639 > 2011-07-06 16:29:58,775 DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: Hlog roll period 3600000ms elapsed > 2011-07-06 16:29:58,775 DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: Hlog roll period 3600000ms elapsed > 2011-07-06 16:30:01,978 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937386639, entries=1135576, filesize=19220372830. New hlog /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309940998890 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira