phoenix-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Saraseka (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-4960) Write to table with global index failed if meta of index changed (split, move, etc)
Date Wed, 17 Apr 2019 19:10:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820419#comment-16820419
] 

Alexander Saraseka commented on PHOENIX-4960:
---------------------------------------------

Seems we have the same.

Version 4.14.1-cdh5.14.2 - during the split (not a region of table used in index), region
server dies due to "Could not update the index table, killing server region because couldn't
write to an index table"

Split event started: 
{code:java}
2019-04-04, 22:16:04.854 Transition null to {04bb34fa7f3f0696582d00ab4cf033a6 state=SPLITTING_NEW,
ts=1554405364851, server=hbase-data003',60020,1553912711884}
2019-04-04, 22:16:04.854 Transition {386b310a7f88bbcf17b1f1d2af6e9a65 state=OPEN, ts=1553912795381,
server=hbase-data003,60020,1553912711884} to {386b310a7f88bbcf17b1f1d2af6e9a65 state=SPLITTING,
ts=1554405364851, server=hbase-data003',60020,1553912711884}
2019-04-04, 22:16:04.853 Transition null to {6239476ebc6fefdd9d05917e1ea12262 state=SPLITTING_NEW,
ts=1554405364851, server=hbase-data003',60020,1553912711884}
2019-04-04, 22:16:04.840 Starting split of region SCHEMA_NAME:delivery_reports_gatewayResponses,BRAND1\x00\xC4\x0FYc\x16\x005065412\x0013473599063\x00d1acccb1af39ea36da8d3eaf9fc5fc35,1547818505037.386b310a7f88bbcf17b1f1d2af6e9a65.
{code}
Region server dies:
{code}
2019-04-04, 22:16:05.065	#339, the task was rejected by the pool. This is unexpected. Server
is hbase-data006,60020,1553912711613
2019-04-04, 22:16:05.069	#339, table=SCHEMA_NAME:SCHEMA_NAME_DELIVERY_REPORTS_BRANDTYPE_ACCOUNTID_GLOBAL_IDX,
attempt=1/1 failed=39ops, last exception: java.util.concurrent.RejectedExecutionException:
Task java.util.concurrent.FutureTask@7a7d7ea0 rejected from java.util.concurrent.ThreadPoolExecutor@6730bfae[Terminated,
pool size = 84, active threads = 0, queued tasks = 0, completed tasks = 1270825] on hbase-data006,60020,1553912711613,
tracking started Thu Apr 04 19:16:05 UTC 2019; not retrying 39 - final failure
2019-04-04, 22:16:05.157 Region server hbase-data003,60020,1553912711884 reported a fatal
error:
ABORTING region server hbase-data003,60020,1553912711884: Could not update the index table,
killing server region because couldn't write to an index table
Cause:
org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException:  disableIndexOnFailure=true,
Failed to write to multiple index tables: [SCHEMA_NAME:SCHEMA_NAME_DELIVERY_REPORTS_BRANDTYPE_ACCOUNTID_GLOBAL_IDX,
SCHEMA_NAME:SCHEMA_NAME_DELIVERY_REPORTS_BRANDTYPE_CAMPAIGNID_GLOBAL_IDX, SCHEMA_NAME:SCHEMA_NAME_DELIVERY_REPORTS_BRANDTYPE_ACCOUNTID_FROMNUMBER_TONUMBER_GLOBAL_IDX]{code}

next time:
Split event started: 
{code}2019-04-15, 19:31:50.409	Transition null to {67eb24356528b9b4ce6a245a0e28b589 state=SPLITTING_NEW,
ts=1555345910406, server=hbase-data001,60020,1555061890520}
2019-04-15, 19:31:50.393	Starting split of region SCHEMA_NAME:delivery_reports_gatewayResponses,BRAND2\x00\xC6\x06\x06\x0D\x09Q\x04\x002870732177003\x0018159781806\x000fd9ec224f7079ba0297e99056f8d583,1553532655360.a80f646809cda80d56872cb899bacdbb.
2019-04-15, 19:31:50.411	Transition null to {30bbf7c667e736e766d5ac3a1f327f9f state=SPLITTING_NEW,
ts=1555345910408, server=hbase-data001,60020,1555061890520}
2019-04-15, 19:31:50.411	Transition {a80f646809cda80d56872cb899bacdbb state=OPEN, ts=1555062460270,
server=hbase-data001,60020,1555061890520} to {a80f646809cda80d56872cb899bacdbb state=SPLITTING,
ts=1555345910408, server=hbase-data001,60020,1555061890520}
{code}
Region server dies:
{code}
2019-04-15, 19:31:50.726	Successfully update INDEX_DISABLE_TIMESTAMP for SCHEMA_NAME:SCHEMA_NAME_DELIVERY_REPORTS_BRANDTYPE_ACCOUNTID_FROMNUMBER_TONUMBER_GLOBAL_IDX
due to an exception while writing updates. indexState=PENDING_DISABLE
2019-04-15, 19:31:50.732	ABORTING region server hbase-data001,60020,1555061890520: Could not
update the index table, killing server region because couldn't write to an index table
{code}

> Write to table with global index failed if meta of index changed (split, move, etc)
> -----------------------------------------------------------------------------------
>
>                 Key: PHOENIX-4960
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4960
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.14.0
>            Reporter: Alex Batyrshin
>            Assignee: Vincent Poon
>            Priority: Critical
>             Fix For: 4.14.1
>
>         Attachments: PHOENIX-4960.v1.4.x-HBase-1.3.patch, PHOENIX-4960.v1.master.patch,
hbase-site-stage.xml
>
>
> Tested on released 4.14.0 and on commit [https://github.com/apache/phoenix/commit/52893c240e4f24e2bfac0834d35205f866c16ed8]
> Reproduce steps:
> 1. Create table with global index
> {code:sql}
> 0: jdbc:phoenix:127.0.0.1> CREATE TABLE test_meta_change (
> . . . . . . . . . . . . .>     pk VARCHAR NOT NULL PRIMARY KEY,
> . . . . . . . . . . . . .>     data VARCHAR
> . . . . . . . . . . . . .> );
> No rows affected (1.359 seconds)
> 0: jdbc:phoenix:127.0.0.1> CREATE INDEX test_meta_change_idx ON test_meta_change (data);
> No rows affected (6.333 seconds)
> {code}
> 2. Check that upsert works
> {code:sql}
> 0: jdbc:phoenix:127.0.0.1> UPSERT INTO test_meta_change VALUES ('a', 'foo');
> 1 row affected (0.186 seconds)
> {code}
> 3. Move index region via HBase shell
> {code:java}
> hbase(main):005:0> move 'index-region-hash'
> 0 row(s) in 0.1330 seconds
> {code}
> 4. Try to upsert again
> {code:sql}
> 0: jdbc:phoenix:127.0.0.1> UPSERT INTO test_meta_change VALUES ('b', 'bar');
> 18/10/08 03:32:10 WARN client.AsyncProcess: #1, table=TEST_META_CHANGE, attempt=1/35
failed=1ops, last exception: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException:
ERROR 1121 (XCL21): Write to the index failed.  disableIndexOnFailure=true, Failed
> to write to multiple index tables: [TEST_META_CHANGE_IDX] ,serverTimestamp=1538958729912,
>         at org.apache.phoenix.util.ServerUtil.wrapInDoNotRetryIOException(ServerUtil.java:265)
>         at org.apache.phoenix.index.PhoenixIndexFailurePolicy.handleFailure(PhoenixIndexFailurePolicy.java:165)
>         at org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:161)
>         at org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:145)
>         at org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:620)
>         at org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:595)
>         at org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:578)
>         at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1048)
>         at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1711)
>         at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
>         at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1745)
>         at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1044)
>         at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3646)
>         at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3108)
>         at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3050)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:916)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:844)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2405)
>         at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2359)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
> Caused by: java.sql.SQLException: ERROR 1121 (XCL21): Write to the index failed.  disableIndexOnFailure=true,
Failed to write to multiple index tables: [TEST_META_CHANGE_IDX]
>         at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:494)
>         at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
>         at org.apache.phoenix.index.PhoenixIndexFailurePolicy.handleFailure(PhoenixIndexFailurePolicy.java:164)
>         ... 21 more
> Caused by: org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException:
 disableIndexOnFailure=true, Failed to write to multiple index tables: [TEST_META_CHANGE_IDX]
>         at org.apache.phoenix.hbase.index.write.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:236)
>         at org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:195)
>         at org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:156)
>         ... 20 more
>  on stg-hbase5,60020,1538935600231, tracking started Mon Oct 08 03:32:09 MSK 2018; not
retrying 1 - final failure
> 1 row affected (0.471 seconds)
> {code}
> 5. Retry
> {code:java}
> 0: jdbc:phoenix:127.0.0.1> UPSERT INTO test_meta_change VALUES ('b', 'bar');
> Error: ERROR 1120 (XCL20): Writes to table blocked until index can be updated. tableName=TEST_META_CHANGE
(state=XCL20,code=1120)
> java.sql.SQLException: ERROR 1120 (XCL20): Writes to table blocked until index can be
updated. tableName=TEST_META_CHANGE
>         at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:494)
>         at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
>         at org.apache.phoenix.execute.MutationState.validateAndGetServerTimestamp(MutationState.java:815)
>         at org.apache.phoenix.execute.MutationState.validateAll(MutationState.java:789)
>         at org.apache.phoenix.execute.MutationState.send(MutationState.java:981)
>         at org.apache.phoenix.execute.MutationState.send(MutationState.java:1514)
>         at org.apache.phoenix.execute.MutationState.commit(MutationState.java:1337)
>         at org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:670)
>         at org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:666)
>         at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>         at org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:666)
>         at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:411)
>         at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:391)
>         at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>         at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:389)
>         at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:378)
>         at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1825)
>         at sqlline.Commands.execute(Commands.java:822)
>         at sqlline.Commands.sql(Commands.java:732)
>         at sqlline.SqlLine.dispatch(SqlLine.java:813)
>         at sqlline.SqlLine.begin(SqlLine.java:686)
>         at sqlline.SqlLine.start(SqlLine.java:398)
>         at sqlline.SqlLine.main(SqlLine.java:291)
> {code}
> Logs from region server:
> {code}
> Oct 08 03:32:09 stg-hbase5 hbase[24993]: 2018-10-08 03:32:09,926 WARN  [hconnection-0x49f74871-shared--pool12-t2]
client.AsyncProcess: #66, table=TEST_META_CHANGE_IDX, attempt=1/1 failed=1ops, last exception:
org.apache.hadoop.hbase.exceptions.RegionMovedException: Region moved to: hostname=stg-hbase4
port=60020 startCode=1538935598023. As of locationSeqNum=6. on stg-hbase5,60020,1538935600231,
tracking started Mon Oct 08 03:32:09 MSK 2018; not retrying 1 - final failure
> Oct 08 03:32:09 stg-hbase5 hbase[24993]: 2018-10-08 03:32:09,953 INFO  [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020]
zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x6f725457 connecting to ZooKeeper
ensemble=10.73.66.21:2181,10.73.66.22:2181,10.73.66.181:2181
> Oct 08 03:32:09 stg-hbase5 hbase[24993]: 2018-10-08 03:32:09,953 INFO  [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020]
zookeeper.ZooKeeper: Initiating client connection, connectString=10.73.66.21:2181,10.73.66.22:2181,10.73.66.181:2181
sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@dcf4b2
> Oct 08 03:32:09 stg-hbase5 hbase[24993]: 2018-10-08 03:32:09,957 INFO  [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.73.66.22:2181)]
zookeeper.ClientCnxn: Opening socket connection to server 10.73.66.22/10.73.66.22:2181. Will
not attempt to authenticate using SASL (unknown error)
> Oct 08 03:32:09 stg-hbase5 hbase[24993]: 2018-10-08 03:32:09,959 INFO  [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.73.66.22:2181)]
zookeeper.ClientCnxn: Socket connection established to 10.73.66.22/10.73.66.22:2181, initiating
session
> Oct 08 03:32:09 stg-hbase5 hbase[24993]: 2018-10-08 03:32:09,962 INFO  [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.73.66.22:2181)]
zookeeper.ClientCnxn: Session establishment complete on server 10.73.66.22/10.73.66.22:2181,
sessionid = 0x20009a64ef6a15c, negotiated timeout = 40000
> Oct 08 03:32:10 stg-hbase5 hbase[24993]: 2018-10-08 03:32:09,997 INFO  [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020]
index.PhoenixIndexFailurePolicy: Successfully update INDEX_DISABLE_TIMESTAMP for TEST_META_CHANGE_IDX
due to an exception while writing updates. indexState=PENDING_DISABLE
> Oct 08 03:32:10 stg-hbase5 hbase[24993]: org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException:
 disableIndexOnFailure=true, Failed to write to multiple index tables: [TEST_META_CHANGE_IDX]
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.phoenix.hbase.index.write.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:236)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:195)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:156)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:145)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:620)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:595)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:578)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1048)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1711)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1745)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1044)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3646)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3108)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3050)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:916)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:844)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2405)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2359)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]:         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
> Oct 08 03:32:10 stg-hbase5 hbase[24993]: 2018-10-08 03:32:10,003 INFO  [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020]
util.IndexManagementUtil: Rethrowing org.apache.hadoop.hbase.DoNotRetryIOException: ERROR
1121 (XCL21): Write to the index failed.  disableIndexOnFailure=true, Failed to write to multiple
index tables: [TEST_META_CHANGE_IDX] ,serverTimestamp=1538958729912,
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message