Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E90B310C96 for ; Thu, 1 Aug 2013 12:52:58 +0000 (UTC) Received: (qmail 74161 invoked by uid 500); 1 Aug 2013 12:52:56 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 74104 invoked by uid 500); 1 Aug 2013 12:52:56 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 74095 invoked by uid 99); 1 Aug 2013 12:52:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Aug 2013 12:52:55 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [209.85.128.172] (HELO mail-ve0-f172.google.com) (209.85.128.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Aug 2013 12:52:50 +0000 Received: by mail-ve0-f172.google.com with SMTP id oz10so2217825veb.3 for ; Thu, 01 Aug 2013 05:52:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=McH4mTsW9zKvpBno/ggWOM+iQGr4zcupQgrEHTvDMws=; b=P8fm6RVAUV0+w5FJrY86LySmeFmPZ3HpCcWfJ9UcVCx4+agpepMpmgyWYinjh3+rkJ dQtrtoOWV2buoq2QQh0xoURZ1PnnZRqhj+ai/MIcRp/x6lti2H7Jmre+xvPq1iNpgOHu vHpUiiQQ32M/bNf7r/fxYcgDB+haH1FHvdrjO23FAvhdVZJptkSl7JSfEZ679+ADmAKl MjHNrHogBKP7ql1TrYz5TnDRkQOoTc6VzYHuueQe6gw6PP4MeuDQ3O2PpwPuBei8Oxpa AhJkrOY1wT2HiU6IcYIGYkukXLaM/hl2Y5vrn2v1UgffZJG12HjaRDEoU2suMx+cp5AQ 08eg== X-Received: by 10.58.6.210 with SMTP id d18mr371230vea.96.1375361529628; Thu, 01 Aug 2013 05:52:09 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.35.82 with HTTP; Thu, 1 Aug 2013 05:51:49 -0700 (PDT) In-Reply-To: References: From: Jean-Marc Spaggiari Date: Thu, 1 Aug 2013 08:51:49 -0400 Message-ID: Subject: Re: AssignmentManager looping? To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=047d7b6d84e869578f04e2e24f45 X-Gm-Message-State: ALoCoQmfnv9Pxk/m39ErAdKt7hLJxaNSrw2cuc3JT4GaPMH1gWCSOXfKWRh2CIH2NxreXp0Vfuz5 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b6d84e869578f04e2e24f45 Content-Type: text/plain; charset=UTF-8 I can't check the meta since HBase is down. Regarding HDFS, I took few random lines like: 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 28328fdb7181cbd9cc4d6814775e8895 not found on server node4,60020,1375319042033; failed processing 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 28328fdb7181cbd9cc4d6814775e8895 from server node4,60020,1375319042033 but it doesn't exist anymore, probably already processed its split And each time, there is nothing like that. hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -lsr / | grep 28328fdb7181cbd9cc4d6814775e8895 On ZK side: [zk: localhost:2181(CONNECTED) 3] ls /hbase/splitlog [zk: localhost:2181(CONNECTED) 10] ls /hbase/unassigned [28328fdb7181cbd9cc4d6814775e8895, a8781a598c46f19723a2405345b58470, b7ebfeb63b10997736fd12920fde2bb8, d95bb27cc026511c2a8c8ad155e79bf6, 270a9c371fcbe9cd9a04986e0b77d16b, aff4d1d8bf470458bb19525e8aef0759] Can I just delete those zknodes? Worst case hbck will find them back from HDFS if required? JM 2013/8/1 Kevin O'dell > Does it exist in meta or hdfs? > On Aug 1, 2013 8:24 AM, "Jean-Marc Spaggiari" > wrote: > > > My master keep logging that: > > > > 2013-07-31 21:52:59,201 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Region > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server > > node7,60020,1375319044055; failed processing > > 2013-07-31 21:52:59,201 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for > region > > 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 > but > > it doesn't exist anymore, probably already processed its split > > 2013-07-31 21:52:59,339 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Region > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server > > node7,60020,1375319044055; failed processing > > 2013-07-31 21:52:59,339 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for > region > > 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 > but > > it doesn't exist anymore, probably already processed its split > > 2013-07-31 21:52:59,461 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Region > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server > > node7,60020,1375319044055; failed processing > > 2013-07-31 21:52:59,461 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for > region > > 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 > but > > it doesn't exist anymore, probably already processed its split > > 2013-07-31 21:52:59,636 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Region > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server > > node7,60020,1375319044055; failed processing > > 2013-07-31 21:52:59,636 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for > region > > 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 > but > > it doesn't exist anymore, probably already processed its split > > 2013-07-31 21:53:00,074 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Region > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server > > node7,60020,1375319044055; failed processing > > 2013-07-31 21:53:00,074 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for > region > > 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 > but > > it doesn't exist anymore, probably already processed its split > > 2013-07-31 21:53:00,261 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Region > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server > > node7,60020,1375319044055; failed processing > > 2013-07-31 21:53:00,261 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for > region > > 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 > but > > it doesn't exist anymore, probably already processed its split > > 2013-07-31 21:53:00,417 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Region > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server > > node7,60020,1375319044055; failed processing > > 2013-07-31 21:53:00,417 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for > region > > 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 > but > > it doesn't exist anymore, probably already processed its split > > > > hbase@node3:~/hbase-0.94.3$ cat logs/hbase-hbase-master-node3.log* | > grep > > "Region 270a9c371fcbe9cd9a04986e0b77d16b not found " | wc > > 5042 65546 927728 > > > > > > Then crashed. > > 2013-07-31 22:22:46,072 FATAL org.apache.hadoop.hbase.master.HMaster: > > Master server abort: loaded coprocessors are: [] > > 2013-07-31 22:22:46,073 FATAL org.apache.hadoop.hbase.master.HMaster: > > Unexpected state : work_proposed,\x02\xE8\x92'\x00\x00\x00\x00 > > > > > http://video.inportnews.ca/search/all/source/sun-news-network/harry-potter-in-translation/68463493001/page/1526,1375307272709.d95bb27cc026511c2a8c8ad155e79bf6 > . > > state=OPENING, ts=1375323766008, server=node7,60020,1375319044055 .. > > Cannot > > transit it to OFFLINE. > > java.lang.IllegalStateException: Unexpected state : > > work_proposed,\x02\xE8\x92'\x00\x00\x00\x00 > > > > > http://video.inportnews.ca/search/all/source/sun-news-network/harry-potter-in-translation/68463493001/page/1526,1375307272709.d95bb27cc026511c2a8c8ad155e79bf6 > . > > state=OPENING, ts=1375323766008, server=node7,60020,1375319044055 .. > > Cannot > > transit it to OFFLINE. > > at > > > > > org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) > > at > > > > > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) > > at > > > > > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) > > at > > > > > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) > > at > > > > > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) > > at > > > > > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) > > at > > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) > > at > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > > at java.lang.Thread.run(Thread.java:722) > > 2013-07-31 22:22:46,075 INFO org.apache.hadoop.hbase.master.HMaster: > > Aborting > > 2013-07-31 22:22:46,075 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > > server on 60000 > > 2013-07-31 22:22:46,075 INFO org.apache.hadoop.hbase.master.HMaster$2: > > node3,60000,1375322220614-BalancerChore exiting > > 2013-07-31 22:22:46,075 INFO > org.apache.hadoop.hbase.master.CatalogJanitor: > > node3,60000,1375322220614-CatalogJanitor exiting > > 2013-07-31 22:22:46,076 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > > IPC Server listener on 60000 > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 9 on 60000: exiting > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 2 on 60000: exiting > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 4 on 60000: exiting > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 8 on 60000: exiting > > 2013-07-31 22:22:46,076 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 6 on 60000: exiting > > 2013-07-31 22:22:46,076 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC > > Server handler 2 on 60000: exiting > > 2013-07-31 22:22:46,076 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC > > Server handler 1 on 60000: exiting > > 2013-07-31 22:22:46,076 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC > > Server handler 0 on 60000: exiting > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 3 on 60000: exiting > > 2013-07-31 22:22:46,076 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 0 on 60000: exiting > > 2013-07-31 22:22:46,077 INFO > > org.apache.hadoop.hbase.master.cleaner.HFileCleaner: > > master-node3,60000,1375322220614.archivedHFileCleaner exiting > > 2013-07-31 22:22:46,077 INFO > > org.apache.hadoop.hbase.master.cleaner.LogCleaner: > > master-node3,60000,1375322220614.oldLogCleaner exiting > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.hbase.master.HMaster: > > Stopping infoServer > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > > IPC Server Responder > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 5 on 60000: exiting > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 7 on 60000: exiting > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 1 on 60000: exiting > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > > IPC Server Responder > > 2013-07-31 22:22:46,078 INFO org.mortbay.log: Stopped > > SelectChannelConnector@0.0.0.0:60010 > > 2013-07-31 22:22:46,127 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Region > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server > > node7,60020,1375319044055; failed processing > > 2013-07-31 22:22:46,127 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for > region > > 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 > but > > it doesn't exist anymore, probably already processed its split > > 2013-07-31 22:22:46,181 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Region > > aff4d1d8bf470458bb19525e8aef0759 not found on server > > node2,60020,1375319046072; failed processing > > 2013-07-31 22:22:46,181 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for > region > > aff4d1d8bf470458bb19525e8aef0759 from server node2,60020,1375319046072 > but > > it doesn't exist anymore, probably already processed its split > > 2013-07-31 22:22:46,193 ERROR > > org.apache.hadoop.hbase.executor.ExecutorService: Cannot submit > > [ClosedRegionHandler-node3,60000,1375322220614-179] because the executor > is > > missing. Is this process shutting down? > > 2013-07-31 22:22:46,250 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Region > > 28328fdb7181cbd9cc4d6814775e8895 not found on server > > node4,60020,1375319042033; failed processing > > 2013-07-31 22:22:46,250 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for > region > > 28328fdb7181cbd9cc4d6814775e8895 from server node4,60020,1375319042033 > but > > it doesn't exist anymore, probably already processed its split > > 2013-07-31 22:22:46,262 INFO > > org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor: > > node3,60000,1375322220614.splitLogManagerTimeoutMonitor exiting > > 2013-07-31 22:22:46,293 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Region > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server > > node7,60020,1375319044055; failed processing > > 2013-07-31 22:22:46,293 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for > region > > 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 > but > > it doesn't exist anymore, probably already processed its split > > 2013-07-31 22:22:46,294 INFO > > > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: > > Closed zookeeper sessionid=0x240024f5666144b > > 2013-07-31 22:22:46,361 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Region > > aff4d1d8bf470458bb19525e8aef0759 not found on server > > node2,60020,1375319046072; failed processing > > 2013-07-31 22:22:46,362 WARN > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for > region > > aff4d1d8bf470458bb19525e8aef0759 from server node2,60020,1375319046072 > but > > it doesn't exist anymore, probably already processed its split > > 2013-07-31 22:22:46,388 INFO > > org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: > > node3,60000,1375322220614.timeoutMonitor exiting > > 2013-07-31 22:22:46,388 INFO > > org.apache.hadoop.hbase.master.AssignmentManager$TimerUpdater: > > node3,60000,1375322220614.timerUpdater exiting > > 2013-07-31 22:22:46,402 INFO org.apache.hadoop.hbase.master.HMaster: > > HMaster main thread exiting > > 2013-07-31 22:22:46,402 ERROR > > org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master > > java.lang.RuntimeException: HMaster Aborted > > at > > > > > org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:160) > > at > > > > > org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at > > > > > org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) > > at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2100) > > > > Seems that HBCK can't do anything. I will start to look at the files into > > HDFS, but suggestions are welcome. > > > > JM > > > --047d7b6d84e869578f04e2e24f45--