From issues-return-341482-archive-asf-public=cust-asf.ponee.io@hbase.apache.org Sat Mar 31 21:04:05 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 928DB18064A for ; Sat, 31 Mar 2018 21:04:04 +0200 (CEST) Received: (qmail 65096 invoked by uid 500); 31 Mar 2018 19:04:03 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 65083 invoked by uid 99); 31 Mar 2018 19:04:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 31 Mar 2018 19:04:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 199C2C603B for ; Sat, 31 Mar 2018 19:04:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.511 X-Spam-Level: X-Spam-Status: No, score=-109.511 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 8-jt8_S0eDs2 for ; Sat, 31 Mar 2018 19:04:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 556A35F173 for ; Sat, 31 Mar 2018 19:04:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 65B60E0217 for ; Sat, 31 Mar 2018 19:04:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1FD57255F9 for ; Sat, 31 Mar 2018 19:04:00 +0000 (UTC) Date: Sat, 31 Mar 2018 19:04:00 +0000 (UTC) From: "Abhishek Kulkarni (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-19287) master hangs forever if RecoverMeta send assign meta region request to target server fail MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-19287?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D164= 21431#comment-16421431 ]=20 Abhishek Kulkarni commented on HBASE-19287: ------------------------------------------- 2018-03-31 14:00:18,202 INFO=C2=A0 [LruBlockCacheStatsExecutor] hfile.LruBl= ockCache: totalSize=3D1.03 MB, freeSize=3D1.38 GB, max=3D1.38 GB, blockCoun= t=3D0, accesses=3D0, hits=3D0, hitRatio=3D0, cachingAccesses=3D0, cachingHi= ts=3D0, cachingHitsRatio=3D0,evictions=3D3239, evicted=3D0, evictedPerRun= =3D0.0 2018-03-31 14:00:18,208 INFO=C2=A0 [MobFileCache #0] mob.MobFileCache: MobF= ileCache Statistics, access: 0, miss: 0, hit: 0, hit ratio: 0%, evicted fil= es: 0 2018-03-31 14:00:20,763 INFO=C2=A0 [regionserver/abhishekk3:16020.logRoller= ] wal.AbstractFSWAL: Rolled WAL /hbase/WALs/abhishekk3.pne.ven.veritas.com,= 16020,1522486816915/abhishekk3.pne.ven.veritas.com%2C16020%2C1522486816915.= 1522515620673 with entries=3D0, filesize=3D83 B; new WAL /hbase/WALs/abhish= ekk3.pne.ven.veritas.com,16020,1522486816915/abhishekk3.pne.ven.veritas.com= %2C16020%2C1522486816915.1522519220738 2018-03-31 14:00:20,763 INFO=C2=A0 [regionserver/abhishekk3:16020.logRoller= ] wal.AbstractFSWAL: Archiving hdfs://abhishekk1.pne.ven.veritas.com:54310/= hbase/WALs/abhishekk3.pne.ven.veritas.com,16020,1522486816915/abhishekk3.pn= e.ven.veritas.com%2C16020%2C1522486816915.1522515620673 to hdfs://abhishekk= 1.pne.ven.veritas.com:54310/hbase/oldWALs/abhishekk3.pne.ven.veritas.com%2C= 16020%2C1522486816915.1522515620673 2018-03-31 14:05:18,202 INFO=C2=A0 [LruBlockCacheStatsExecutor] hfile.LruBl= ockCache: totalSize=3D1.03 MB, freeSize=3D1.38 GB, max=3D1.38 GB, blockCoun= t=3D0, accesses=3D0, hits=3D0, hitRatio=3D0, cachingAccesses=3D0, cachingHi= ts=3D0, cachingHitsRatio=3D0,evictions=3D3269, evicted=3D0, evictedPerRun= =3D0.0 > master hangs forever if RecoverMeta send assign meta region request to ta= rget server fail > -------------------------------------------------------------------------= ---------------- > > Key: HBASE-19287 > URL: https://issues.apache.org/jira/browse/HBASE-19287 > Project: HBase > Issue Type: Bug > Components: proc-v2 > Affects Versions: 2.0.0 > Reporter: Yi Liang > Assignee: Yi Liang > Priority: Major > Fix For: 2.0.0-beta-1, 2.0.0 > > Attachments: HBASE-19287-master-v3.patch, HBASE-19287-master-v3.p= atch, HBASE-19287-master-v4.patch, hbase-19287-master-v2.patch, master.patc= h > > > 2017-11-10 19:26:56,019 INFO [ProcExecWrkr-1] procedure.RecoverMetaProce= dure: pid=3D138, state=3DRUNNABLE:RECOVER_META_ASSIGN_REGIONS; RecoverMetaP= rocedure failedMetaServer=3Dnull, splitWal=3Dtrue; Retaining meta assignmen= t to server=3Dhadoop-slave1.hadoop,16020,1510341981454 > 2017-11-10 19:26:56,029 INFO [ProcExecWrkr-1] procedure2.ProcedureExecut= or: Initialized subprocedures=3D[{pid=3D139, ppid=3D138, state=3DRUNNABLE:R= EGION_TRANSITION_QUEUE; AssignProcedure table=3Dhbase:meta, region=3D158823= 0740, target=3Dhadoop-slave1.hadoop,16020,1510341981454}] > 2017-11-10 19:26:56,067 INFO [ProcExecWrkr-2] procedure.MasterProcedureS= cheduler: pid=3D139, ppid=3D138, state=3DRUNNABLE:REGION_TRANSITION_QUEUE; = AssignProcedure table=3Dhbase:meta, region=3D1588230740, target=3Dhadoop-sl= ave1.hadoop,16020,1510341981454 hbase:meta hbase:meta,,1.1588230740 > 2017-11-10 19:26:56,071 INFO [ProcExecWrkr-2] assignment.AssignProcedure= : Start pid=3D139, ppid=3D138, state=3DRUNNABLE:REGION_TRANSITION_QUEUE; As= signProcedure table=3Dhbase:meta, region=3D1588230740, target=3Dhadoop-slav= e1.hadoop,16020,1510341981454; rit=3DOFFLINE, location=3Dhadoop-slave1.hado= op,16020,1510341981454; forceNewPlan=3Dfalse, retain=3Dfalse > 2017-11-10 19:26:56,224 INFO [ProcExecWrkr-4] zookeeper.MetaTableLocator= : Setting hbase:meta (replicaId=3D0) location in ZooKeeper as hadoop-slave2= .hadoop,16020,1510341988652 > 2017-11-10 19:26:56,230 INFO [ProcExecWrkr-4] assignment.RegionTransitio= nProcedure: Dispatch pid=3D139, ppid=3D138, state=3DRUNNABLE:REGION_TRANSIT= ION_DISPATCH; AssignProcedure table=3Dhbase:meta, region=3D1588230740, targ= et=3Dhadoop-slave1.hadoop,16020,1510341981454; rit=3DOPENING, location=3Dha= doop-slave2.hadoop,16020,1510341988652 > 2017-11-10 19:26:56,382 INFO [ProcedureDispatcherTimeoutThread] procedur= e.RSProcedureDispatcher: Using procedure batch rpc execution for serverName= =3Dhadoop-slave2.hadoop,16020,1510341988652 version=3D2097152 > 2017-11-10 19:26:57,542 INFO [main-EventThread] zookeeper.RegionServerTr= acker: RegionServer ephemeral node deleted, processing expiration [hadoop-s= lave2.hadoop,16020,1510341988652] > 2017-11-10 19:26:57,543 INFO [main-EventThread] master.ServerManager: Ma= ster doesn't enable ServerShutdownHandler during initialization, delay expi= ring server hadoop-slave2.hadoop,16020,1510341988652 > 2017-11-10 19:26:58,875 INFO [RpcServer.default.FPBQ.Fifo.handler=3D29,q= ueue=3D2,port=3D16000] master.ServerManager: Registering server=3Dhadoop-sl= ave1.hadoop,16020,1510342016106 > 2017-11-10 19:27:05,832 INFO [RpcServer.default.FPBQ.Fifo.handler=3D29,q= ueue=3D2,port=3D16000] master.ServerManager: Registering server=3Dhadoop-sl= ave2.hadoop,16020,1510342023184 > 2017-11-10 19:27:05,832 INFO [RpcServer.default.FPBQ.Fifo.handler=3D29,q= ueue=3D2,port=3D16000] master.ServerManager: Triggering server recovery; ex= istingServer hadoop-slave2.hadoop,16020,1510341988652 looks stale, new serv= er:hadoop-slave2.hadoop,16020,1510342023184 > 2017-11-10 19:27:05,832 INFO [RpcServer.default.FPBQ.Fifo.handler=3D29,q= ueue=3D2,port=3D16000] master.ServerManager: Master doesn't enable ServerSh= utdownHandler during initialization, delay expiring server hadoop-slave2.ha= doop,16020,1510341988652 > 2017-11-10 19:27:49,815 INFO [RpcServer.default.FPBQ.Fifo.handler=3D29,q= ueue=3D2,port=3D16000] client.RpcRetryingCallerImpl: tarted=3D38594 ms ago,= cancelled=3Dfalse, msg=3Dorg.apache.hadoop.hbase.NotServingRegionException= : hbase:meta,,1 is not online on hadoop-slave2.hadoop,16020,1510342023184 > at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionBy= EncodedName(HRegionServer.java:3290) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(R= SRpcServices.java:1370) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcSe= rvices.java:2401) > at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos= $ClientService$2.callBlockingMethod(ClientProtos.java:41544) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:406) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133= ) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecuto= r.java:278) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecuto= r.java:258) > row 'hbase:namespace' on table 'hbase:meta' at region=3Dhbase:meta,,1.15= 88230740, hostname=3Dhadoop-slave2.hadoop,16020,1510341988652, seqNum=3D0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)