Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0CE59769D for ; Wed, 21 Dec 2011 07:58:01 +0000 (UTC) Received: (qmail 69029 invoked by uid 500); 21 Dec 2011 07:58:00 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 68837 invoked by uid 500); 21 Dec 2011 07:58:00 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 68785 invoked by uid 99); 21 Dec 2011 07:57:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Dec 2011 07:57:58 +0000 X-ASF-Spam-Status: No, hits=-2002.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Dec 2011 07:57:54 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id B18041209E9 for ; Wed, 21 Dec 2011 07:57:32 +0000 (UTC) Date: Wed, 21 Dec 2011 07:57:32 +0000 (UTC) From: "Hudson (Commented) (JIRA)" To: issues@hbase.apache.org Message-ID: <1440043783.34577.1324454252728.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <363518309.32687.1324418491888.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173934#comment-13173934 ] Hudson commented on HBASE-5078: ------------------------------- Integrated in HBase-0.92 #205 (See [https://builds.apache.org/job/HBase-0.92/205/]) HBASE-5078 DistributedLogSplitter failing to split file because it has edits for lots of regions stack : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java > DistributedLogSplitter failing to split file because it has edits for lots of regions > ------------------------------------------------------------------------------------- > > Key: HBASE-5078 > URL: https://issues.apache.org/jira/browse/HBASE-5078 > Project: HBase > Issue Type: Bug > Affects Versions: 0.92.0 > Reporter: stack > Assignee: stack > Priority: Critical > Fix For: 0.92.0 > > Attachments: 5078-v2.txt, 5078-v3.txt, 5078-v4.txt, 5078.txt > > > Testing 0.92.0RC, ran into interesting issue where a log file had edits for many regions and just opening the file per region was taking so long, we were never updating our progress and so the split of the log just kept failing; in this case, the first 40 edits in a file required our opening 35 files -- opening 35 files took longer than the hard-coded 25 seconds its supposed to take "acquiring" the task. > First, here is master's view: > {code} > 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679 ver = 0 > ... > 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: task /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679 acquired by sv4r27s44,7003,1324365396664 > ... > 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033 ver = 3 > {code} > Master then gives it elsewhere. > Over on the regionserver we see: > {code} > 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker sv4r27s44,7003,1324365396664 acquired task /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679 > .... > 2011-12-20 17:54:10,714 DEBUG org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed > its/0000000000000278862.temp, syncFs=true, hflush=false > .... > {code} > .... and so on till: > {code} > 2011-12-20 17:54:36,876 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: task /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679 preempted from sv4r27s44,7003,1324365396664, current task state and owner=owned sv4r28s44,7003,1324365396678 > .... > 2011-12-20 17:54:37,112 WARN org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679 > .... > {code} > When above happened, we'd only processed 40 edits. As written, we only heatbeat every 1024 edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira