Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CABEE831C for ; Fri, 12 Aug 2011 07:57:17 +0000 (UTC) Received: (qmail 87693 invoked by uid 500); 12 Aug 2011 07:57:17 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 87375 invoked by uid 500); 12 Aug 2011 07:57:11 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 87211 invoked by uid 99); 12 Aug 2011 07:56:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Aug 2011 07:56:53 +0000 X-ASF-Spam-Status: No, hits=-2000.8 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Aug 2011 07:56:49 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 8A919B9949 for ; Fri, 12 Aug 2011 07:56:27 +0000 (UTC) Date: Fri, 12 Aug 2011 07:56:27 +0000 (UTC) From: "Ted Yu (JIRA)" To: issues@hbase.apache.org Message-ID: <1661409212.32583.1313135787564.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <213244034.36527.1302047286287.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HBASE-3741) Make HRegionServer aware of the regions it's opening/closing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-3741?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-3741: -------------------------- Fix Version/s: (was: 0.90.5) 0.90.3 > Make HRegionServer aware of the regions it's opening/closing > ------------------------------------------------------------ > > Key: HBASE-3741 > URL: https://issues.apache.org/jira/browse/HBASE-3741 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.1 > Reporter: Jean-Daniel Cryans > Assignee: Jean-Daniel Cryans > Priority: Blocker > Fix For: 0.90.3 > > Attachments: HBASE-3741-rsfix-v2.patch, HBASE-3741-rsfix-v3.patch= , HBASE-3741-rsfix.patch, HBASE-3741-trunk.patch > > > This is a serious issue about a race between regions being opened and clo= sed in region servers. We had this situation where the master tried to unas= sign a region for balancing, failed, force unassigned it, force assigned it= somewhere else, failed to open it on another region server (took too long)= , and then reassigned it back to the original region server. A few seconds = later, the region server processed the first closed and the region was left= unassigned. > This is from the master log: > {quote} > 11-04-05 15:11:17,758 DEBUG org.apache.hadoop.hbase.master.AssignmentMana= ger: Sent CLOSE to serverName=3Dsv4borg42,60020,1300920459477, load=3D(requ= ests=3D187, regions=3D574, usedHeap=3D3918, maxHeap=3D6973) for region stum= bles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,= 1266566087256.1470298961 > 2011-04-05 15:12:10,021 INFO org.apache.hadoop.hbase.master.AssignmentMan= ager: Regions in transition timed out: stumbles_by_userid2,\x00'\x8E\xE8\x= 7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 state= =3DPENDING_CLOSE, ts=3D1302041477758 > 2011-04-05 15:12:10,021 INFO org.apache.hadoop.hbase.master.AssignmentMan= ager: Region has been PENDING_CLOSE for too long, running forced unassign a= gain on region=3Dstumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\= xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > ... > 2011-04-05 15:14:45,783 DEBUG org.apache.hadoop.hbase.master.AssignmentMa= nager: Forcing OFFLINE; was=3Dstumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE= \xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 state=3DCLOSED, = ts=3D1302041685733 > 2011-04-05 15:14:45,783 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:= master:60000-0x42ec2cece810b68 Creating (or updating) unassigned node for = 1470298961 with OFFLINE state > ... > 2011-04-05 15:14:45,885 DEBUG org.apache.hadoop.hbase.master.AssignmentMa= nager: Using pre-existing plan for region stumbles_by_userid2,\x00'\x8E\xE8= \x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961; pla= n=3Dhri=3Dstumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF= \x01\x10\xCC6,1266566087256.1470298961, src=3Dsv4borg42,60020,1300920459477= , dest=3Dsv4borg40,60020,1302041218196 > 2011-04-05 15:14:45,885 DEBUG org.apache.hadoop.hbase.master.AssignmentMa= nager: Assigning region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\x= A9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 to sv4borg40,60020,130= 2041218196 > 2011-04-05 15:15:39,410 INFO org.apache.hadoop.hbase.master.AssignmentMan= ager: Regions in transition timed out: stumbles_by_userid2,\x00'\x8E\xE8\x= 7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 state= =3DPENDING_OPEN, ts=3D1302041700944 > 2011-04-05 15:15:39,410 INFO org.apache.hadoop.hbase.master.AssignmentMan= ager: Region has been PENDING_OPEN for too long, reassigning region=3Dstumb= les_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1= 266566087256.1470298961 > 2011-04-05 15:15:39,410 DEBUG org.apache.hadoop.hbase.master.AssignmentMa= nager: Forcing OFFLINE; was=3Dstumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE= \xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 state=3DPENDING_= OPEN, ts=3D1302041700944 > ... > 2011-04-05 15:15:39,410 DEBUG org.apache.hadoop.hbase.master.AssignmentMa= nager: No previous transition plan was found (or we are ignoring an existin= g plan) for stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\x= DF\x01\x10\xCC6,1266566087256.1470298961 so generated a random one; hri=3Ds= tumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xC= C6,1266566087256.1470298961, src=3D, dest=3Dsv4borg42,60020,1300920459477; = 19 (online=3D19, exclude=3Dnull) available servers > 2011-04-05 15:15:39,410 DEBUG org.apache.hadoop.hbase.master.AssignmentMa= nager: Assigning region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\x= A9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 to sv4borg42,60020,130= 0920459477 > 2011-04-05 15:15:40,951 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeper= Watcher: master:60000-0x42ec2cece810b68 Received ZooKeeper Event, type=3DNo= deDataChanged, state=3DSyncConnected, path=3D/prodjobs/unassigned/147029896= 1 > 2011-04-05 15:15:40,952 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: m= aster:60000-0x42ec2cece810b68 Retrieved 93 byte(s) of data from znode /prod= jobs/unassigned/1470298961 and set watcher; region=3Dstumbles_by_userid2,'= =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=E7=A9=97=EF=BF=BD=EF=BF=BD=EF=BF=BD6,1= 266566087256, server=3Dsv4borg42,60020,1300920459477, state=3DRS_ZK_REGION_= OPENED > 2011-04-05 15:15:40,952 DEBUG org.apache.hadoop.hbase.master.AssignmentMa= nager: Handling transition=3DRS_ZK_REGION_OPENED, server=3Dsv4borg42,60020,= 1300920459477, region=3D1470298961 > 2011-04-05 15:15:42,222 DEBUG org.apache.hadoop.hbase.master.handler.Open= edRegionHandler: Handling OPENED event for 1470298961; deleting unassigned = node > ... > 2011-04-05 15:15:55,812 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: m= aster:60000-0x42ec2cece810b68 Retrieved 93 byte(s) of data from znode /prod= jobs/unassigned/1470298961 and set watcher; region=3Dstumbles_by_userid2,'= =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=E7=A9=97=EF=BF=BD=EF=BF=BD=EF=BF=BD6,1= 266566087256, server=3Dsv4borg42,60020,1300920459477, state=3DRS_ZK_REGION_= CLOSING > 2011-04-05 15:15:55,812 DEBUG org.apache.hadoop.hbase.master.AssignmentMa= nager: Handling new unassigned node: /prodjobs/unassigned/1470298961 (regio= n=3Dstumbles_by_userid2,'=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=E7=A9=97=EF= =BF=BD=EF=BF=BD=EF=BF=BD6,1266566087256, server=3Dsv4borg42,60020,130092045= 9477, state=3DRS_ZK_REGION_CLOSING) > 2011-04-05 15:15:55,812 DEBUG org.apache.hadoop.hbase.master.AssignmentMa= nager: Handling transition=3DRS_ZK_REGION_CLOSING, server=3Dsv4borg42,60020= ,1300920459477, region=3D1470298961 > 2011-04-05 15:15:55,812 WARN org.apache.hadoop.hbase.master.AssignmentMan= ager: Received CLOSING for region 1470298961 from server sv4borg42,60020,13= 00920459477 but region was in the state null and not in expected PENDING_C= LOSE or CLOSING states > {quote} > And from sv4borg42: > {quote} > 2011-04-05 15:09:58,755 INFO org.apache.hadoop.hbase.regionserver.HRegion= Server: Received close region: stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xF= E\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:11:17,757 INFO org.apache.hadoop.hbase.regionserver.HRegion= Server: Received close region: stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xF= E\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:12:10,021 INFO org.apache.hadoop.hbase.regionserver.HRegion= Server: Received close region: stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xF= E\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:14:45,675 DEBUG org.apache.hadoop.hbase.regionserver.handle= r.CloseRegionHandler: Processing close of stumbles_by_userid2,\x00'\x8E\xE8= \x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:14:45,700 DEBUG org.apache.hadoop.hbase.regionserver.HRegio= n: Closing stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xD= F\x01\x10\xCC6,1266566087256.1470298961: disabling compactions & flushes > 2011-04-05 15:14:45,701 DEBUG org.apache.hadoop.hbase.regionserver.HRegio= n: Updates disabled for region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xF= E\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:14:45,701 INFO org.apache.hadoop.hbase.regionserver.HRegion= : Closed stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\= x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:14:45,758 DEBUG org.apache.hadoop.hbase.regionserver.handle= r.CloseRegionHandler: Closed region stumbles_by_userid2,\x00'\x8E\xE8\x7F\x= FF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:39,410 INFO org.apache.hadoop.hbase.regionserver.HRegion= Server: Received request to open region: stumbles_by_userid2,\x00'\x8E\xE8\= x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:39,410 DEBUG org.apache.hadoop.hbase.regionserver.handle= r.OpenRegionHandler: Processing open of stumbles_by_userid2,\x00'\x8E\xE8\x= 7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:39,486 DEBUG org.apache.hadoop.hbase.regionserver.HRegio= n: Opening region: REGION =3D> {NAME =3D> 'stumbles_by_userid2,\x00'\x8E\xE= 8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256', STARTKEY =3D= > '\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6', ENDKEY =3D>= '\x00'\x9AU\x7F\xFF\xFE\xEBQ\xB0\xC3\xEF\x00Jr\xF2', ENCODED =3D> 14702989= 61, TABLE =3D> ... > 2011-04-05 15:15:39,487 DEBUG org.apache.hadoop.hbase.regionserver.HRegio= n: Instantiated stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\x= FC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:40,399 INFO org.apache.hadoop.hbase.regionserver.HRegion= : Onlined stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF= \x01\x10\xCC6,1266566087256.1470298961; next sequenceid=3D37627407247 > 2011-04-05 15:15:40,488 INFO org.apache.hadoop.hbase.catalog.MetaEditor: = Updated row stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\x= DF\x01\x10\xCC6,1266566087256.1470298961 in region .META.,,1 with server=3D= sv4borg42:60020, startcode=3D1300920459477 > 2011-04-05 15:15:40,582 DEBUG org.apache.hadoop.hbase.regionserver.handle= r.OpenRegionHandler: Opened stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\x= E7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:55,776 DEBUG org.apache.hadoop.hbase.regionserver.handle= r.CloseRegionHandler: Processing close of stumbles_by_userid2,\x00'\x8E\xE8= \x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:55,809 DEBUG org.apache.hadoop.hbase.regionserver.HRegio= n: Closing stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xD= F\x01\x10\xCC6,1266566087256.1470298961: disabling compactions & flushes > 2011-04-05 15:15:55,809 DEBUG org.apache.hadoop.hbase.regionserver.HRegio= n: Updates disabled for region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xF= E\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:55,809 INFO org.apache.hadoop.hbase.regionserver.HRegion= : Closed stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\= x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:55,842 DEBUG org.apache.hadoop.hbase.regionserver.handle= r.CloseRegionHandler: Closed region stumbles_by_userid2,\x00'\x8E\xE8\x7F\x= FF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:55,943 DEBUG org.apache.hadoop.hbase.regionserver.handle= r.CloseRegionHandler: Processing close of stumbles_by_userid2,\x00'\x8E\xE8= \x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 > 2011-04-05 15:15:55,943 WARN org.apache.hadoop.hbase.regionserver.handler= .CloseRegionHandler: Received CLOSE for region stumbles_by_userid2,\x00'\x8= E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961= but currently not serving > {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira