ambari-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Wagle (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-15361) Fix ordering of Alter table calls which could result in Region Close issue
Date Thu, 10 Mar 2016 00:22:41 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-15361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188361#comment-15188361
] 

Siddharth Wagle commented on AMBARI-15361:
------------------------------------------

Modifications:
- Moved ALTER TTL calls to the HBaseAdmin call for updating policies
- Modified unit test to verify changes.

> Fix ordering of Alter table calls which could result in Region Close issue
> --------------------------------------------------------------------------
>
>                 Key: AMBARI-15361
>                 URL: https://issues.apache.org/jira/browse/AMBARI-15361
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-metrics
>    Affects Versions: 2.2.0
>            Reporter: Siddharth Wagle
>            Assignee: Siddharth Wagle
>            Priority: Critical
>             Fix For: 2.2.2
>
>         Attachments: AMBARI-15361-1.patch, AMBARI-15361.patch
>
>
> Two Alter table calls one of TTL and the other for setting the Region policies could
result in a possible race condition on HBase side. 
> Here is some details:
> The assignment operation on the region in question ({{2e13a2d264c50e27c32cd4d3cad6ebca}})
further indicates possible network partition - the region is CLOSED and OFFLINED successfully,
and trying to OPEN again, but it found conflict information from zk node (the zk node had
a stale information and still thought that it is still in the CLOSING state, which is not
true based on the log) - the assignment succeeds, but updating the assignment information
to zk failed due to incorrect version in the zk node.  
> {noformat}
> 2016-02-18 00:05:46,603 INFO  [os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,52093,1455753905161-org.apache.hadoop.hbase.master.BulkReOpen-0]
master.RegionStates: Transition {2e13a2d264c50e27c32cd4d3cad6ebca state=OPEN, ts=1455753935244,
server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723} to {2e13a2d264c50e27c32cd4d3cad6ebca
state=PENDING_CLOSE, ts=1455753946603, server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723}
> 2016-02-18 00:05:46,605 INFO  [PriorityRpcServer.handler=16,queue=0,port=41497] regionserver.RSRpcServices:
Close 2e13a2d264c50e27c32cd4d3cad6ebca, moving to null
> 2016-02-18 00:05:46,629 INFO  [StoreCloserThread-METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.-1]
regionserver.HStore: Closed 0
> 2016-02-18 00:05:46,635 INFO  [RS_CLOSE_REGION-os-s11-3-zxgfju-ambari-hv-ser-ha-3-1:41497-1]
regionserver.HRegion: Closed METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.
> 2016-02-18 00:05:46,643 INFO  [AM.ZK.Worker-pool3-t1] master.RegionStates: Transition
{2e13a2d264c50e27c32cd4d3cad6ebca state=PENDING_CLOSE, ts=1455753946603, server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723}
to {2e13a2d264c50e27c32cd4d3cad6ebca state=CLOSED, ts=1455753946643, server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723}
> 2016-02-18 00:05:46,644 INFO  [AM.ZK.Worker-pool3-t1] master.AssignmentManager: Setting
node as OFFLINED in ZooKeeper for region {ENCODED => 2e13a2d264c50e27c32cd4d3cad6ebca,
NAME => 'METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.', STARTKEY =>
'', ENDKEY => ''}
> 2016-02-18 00:05:46,644 INFO  [AM.ZK.Worker-pool3-t1] master.RegionStates: Transition
{2e13a2d264c50e27c32cd4d3cad6ebca state=CLOSED, ts=1455753946643, server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723}
to {2e13a2d264c50e27c32cd4d3cad6ebca state=OFFLINE, ts=1455753946644, server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723}
> 2016-02-18 00:05:46,646 INFO  [AM.ZK.Worker-pool3-t1] master.AssignmentManager: Assigning
METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca. to os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723
> 2016-02-18 00:05:46,646 INFO  [AM.ZK.Worker-pool3-t1] master.RegionStates: Transition
{2e13a2d264c50e27c32cd4d3cad6ebca state=OFFLINE, ts=1455753946644, server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723}
to {2e13a2d264c50e27c32cd4d3cad6ebca state=PENDING_OPEN, ts=1455753946646, server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723}
> 2016-02-18 00:05:46,647 INFO  [PriorityRpcServer.handler=5,queue=1,port=41497] regionserver.RSRpcServices:
Open METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.
> 2016-02-18 00:05:46,647 INFO  [PriorityRpcServer.handler=5,queue=1,port=41497] coordination.ZkOpenRegionCoordination:
Opening of region {ENCODED => 2e13a2d264c50e27c32cd4d3cad6ebca, NAME => 'METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.',
STARTKEY => '', ENDKEY => ''} failed, transitioning from OFFLINE to FAILED_OPEN in ZK,
expecting version 2
> 2016-02-18 00:05:46,649 WARN  [PriorityRpcServer.handler=5,queue=1,port=41497] regionserver.RSRpcServices:
Failed opening region METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.
> org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException: Received OPEN
for the region:METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca. , which we are
already trying to CLOSE 
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1477)
> 	at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22239)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> 	at java.lang.Thread.run(Thread.java:745)
> 2016-02-18 00:05:46,653 WARN  [AM.ZK.Worker-pool3-t1] master.AssignmentManager: Failed
assignment of METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca. to os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723,
waiting a little before trying on the same region server try=1 of 10
> ...
> 2016-02-18 00:05:46,771 WARN  [RS_OPEN_REGION-os-s11-3-zxgfju-ambari-hv-ser-ha-3-1:41497-0]
coordination.ZkOpenRegionCoordination: Unable to mark region {ENCODED => 2e13a2d264c50e27c32cd4d3cad6ebca,
NAME => 'METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.', STARTKEY =>
'', ENDKEY => ''} as FAILED_OPEN. It's likely that the master already timed out this open
attempt, and thus another RS already has the region.
> 016-02-18 00:05:47,038 WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn:
caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid 0x152f1b1bbcb0014,
likely client has closed socket
> 	at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> 	at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> 	at java.lang.Thread.run(Thread.java:745)
> 2016-02-18 00:05:47,039 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn:
Closed socket connection for client /127.0.0.1:51677 which had sessionid 0x152f1b1bbcb0014
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message