hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abhishek Singh Chouhan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances
Date Wed, 15 Feb 2017 12:45:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867762#comment-15867762
] 

Abhishek Singh Chouhan commented on HBASE-17069:
------------------------------------------------

[~Apache9] Yep got confused a bit there :) retrowIOException might be better.

Got to the bottom of things. Here's what was happening.
- Region A getting split into B & C. The first request with daughter info is a multi (with
hregion info) while the second one is a put.

In Hregion.processRowsWithLocks we have 
{noformat}
// 6. Append no sync
          if (!walEdit.isEmpty()) {
            // we use HLogKey here instead of WALKey directly to support legacy coprocessors.
            walKey = new HLogKey(this.getRegionInfo().getEncodedNameAsBytes(),
              this.htableDescriptor.getTableName(), WALKey.NO_SEQUENCE_ID, now,
              processor.getClusterIds(), nonceGroup, nonce, mvcc);
            txid = this.wal.append(this.htableDescriptor, this.getRegionInfo(),
                walKey, walEdit, false);
          }
{noformat}

Since we pass false for inMemstore in append, we mess up the seq id accounting. In SequenceIdAccounting.update()
we pass false for the multirequest (lets say sequence id here was 1) so lowestunflusedsequenceid
is not updated.
Now for the second put that goes through doMiniBatchMutation we pass true correctly during
append(Seq id 2). lowestUnflushedSequenceIds is set to 2 for the metafamily. The rs sends
the report using HRegion.setCompleteSequenceId where it sets the lastflushedsequence id for
this store as 1 (however we still haven't actually flushed).
- At this point the RS dies
- During the split we receive lastflushedseqid for this store as 1 and filter out the cells
belonging to the multi which had the hregioninfo. The regionserver thats opening the region
now will replay the edits correctly but we've lost data belonging to the multi and hence the
client fails with "HRegionInfo was null"

However this case is not particular to split or meta but the case where a region is just opened
and we do a number of multi followed by a put, in case the RS dies before we flush we lose
data belonging to the multi. Fix is simply a line change :)
[~apurtell] [~lhofhansl]


> RegionServer writes invalid META entries for split daughters in some circumstances
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-17069
>                 URL: https://issues.apache.org/jira/browse/HBASE-17069
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.2.4
>            Reporter: Andrew Purtell
>            Assignee: Abhishek Singh Chouhan
>            Priority: Critical
>         Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, daughter_2_08629d59564726da2497f70451aafcdb.log,
logs.tar.gz, parent-393d2bfd8b1c52ce08540306659624f2.log
>
>
> I have been seeing frequent ITBLL failures testing various versions of 1.2.x. 
> Over the lifetime of 1.2.x the following issues have been fixed:
> - HBASE-15315 (Remove always set super user call as high priority)
> - HBASE-16093 (Fix splits failed before creating daughter regions leave meta inconsistent)
> And this one is pending:
> - HBASE-17044 (Fix merge failed before creating merged region leaves meta inconsistent)
> I can apply all of the above to branch-1.2 and still see this failure: 
> *The life of stillborn region d55ef81c2f8299abbddfce0445067830*
> *Master sees SPLITTING_NEW*
> {noformat}
> 2016-11-08 04:23:21,186 INFO  [AM.ZK.Worker-pool2-t82] master.RegionStates: Transition
null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, ts=1478579001186, server=node-3.cluster,16020,1478578389506}
> {noformat}
> *The RegionServer creates it*
> {noformat}
> 2016-11-08 04:23:26,035 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig:
Created cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, currentSize=14996112,
freeSize=12823716208, maxSize=12838712320, heapSize=14996112, minSize=12196776960, minFactor=0.95,
multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true,
cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false,
cacheDataCompressed=false, prefetchOnOpen=false
> 2016-11-08 04:23:26,038 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig:
Created cacheConfig for big: blockCache=LruBlockCache{blockCount=34, currentSize=14996112,
freeSize=12823716208, maxSize=12838712320, heapSize=14996112, minSize=12196776960, minFactor=0.95,
multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true,
cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false,
cacheDataCompressed=false, prefetchOnOpen=false
> 2016-11-08 04:23:26,442 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig:
Created cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, currentSize=17187656,
freeSize=12821524664, maxSize=12838712320, heapSize=17187656, minSize=12196776960, minFactor=0.95,
multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true,
cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false,
cacheDataCompressed=false, prefetchOnOpen=false
> 2016-11-08 04:23:26,713 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig:
Created cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, currentSize=19178440,
freeSize=12819533880, maxSize=12838712320, heapSize=19178440, minSize=12196776960, minFactor=0.95,
multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true,
cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false,
cacheDataCompressed=false, prefetchOnOpen=false
> 2016-11-08 04:23:26,715 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig:
Created cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, currentSize=19178440,
freeSize=12819533880, maxSize=12838712320, heapSize=19178440, minSize=12196776960, minFactor=0.95,
multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true,
cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false,
cacheDataCompressed=false, prefetchOnOpen=false
> 2016-11-08 04:23:26,717 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig:
Created cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, currentSize=19178440,
freeSize=12819533880, maxSize=12838712320, heapSize=19178440, minSize=12196776960, minFactor=0.95,
multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true,
cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false,
cacheDataCompressed=false, prefetchOnOpen=false
> {noformat}
> *The RegionServer onlines it*
> {noformat}
> 2016-11-08 04:23:27,015 INFO  [node-3.cluster,16020,1478578389506-daughterOpener=d55ef81c2f8299abbddfce0445067830]
regionserver.HRegion: Onlined d55ef81c2f8299abbddfce0445067830; next sequenceid=19184
> 2016-11-08 04:23:27,029 INFO  [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099]
regionserver.HRegionServer: Post open deploy tasks for IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.
> 2016-11-08 04:23:27,047 INFO  [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099]
hbase.MetaTableAccessor: Updated row IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.
with server=node-3.cluster,16020,1478578389506
> {noformat}
> *The Master transitions state from SPLITTING_NEW to OPEN*
> {noformat}
> 2016-11-08 04:23:27,058 INFO  [AM.ZK.Worker-pool2-t84] master.RegionStates: Transition
{d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, ts=1478579007057, server=node-3.cluster,16020,1478578389506}
to {d55ef81c2f8299abbddfce0445067830 state=OPEN, ts=1478579007058, server=node-3.cluster,16020,1478578389506}
> 2016-11-08 04:23:27,059 INFO  [AM.ZK.Worker-pool2-t84] master.AssignmentManager: Handled
SPLIT event; parent=IntegrationTestBigLinkedList,,1478577020916.393d2bfd8b1c52ce08540306659624f2.,
daughter a=IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.,
daughter b=IntegrationTestBigLinkedList,/\xFB\x14,1478579001155.08629d59564726da2497f70451aafcdb.,
on node-3.cluster,16020,1478578389506
> {noformat}
> *RegionServer updates META  - BUT APPARENTLY NOT CORRECTLY*
> {noformat}
> 2016-11-08 04:23:27,165 INFO  [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099]
regionserver.SplitRequest: Region split, hbase:meta updated, and report to master. Parent=IntegrationTestBigLinkedList,,1478577020916.393d2bfd8b1c52ce08540306659624f2.,
new regions: IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.,
IntegrationTestBigLinkedList,/\xFB\x14,1478579001155.08629d59564726da2497f70451aafcdb.. Split
took 6sec
> {noformat}
> *RegionServer delays flush*
> (Is this important?)
> {noformat}
> 2016-11-08 04:24:14,639 WARN  [MemStoreFlusher.0] regionserver.MemStoreFlusher: Region
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830. has too many
store files; delaying flush up to 90000ms
> {noformat}
> *Immediate warnings about No serialized HRegionInfo*
> {noformat}
> 2016-11-08 04:24:44,691 WARN  [B.defaultRpcServer.handler=26,queue=2,port=16000] hbase.MetaTableAccessor:
No serialized HRegionInfo in keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478579007029/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478579007029/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478579007029/Put/vlen=8/seqid=0}
> {noformat}
> *Master is not happy either*
> {noformat}
> 2016-11-08 04:24:51,148 WARN  [MASTER_TABLE_OPERATIONS-node-1:16000-0] hbase.MetaTableAccessor:
No serialized HRegionInfo in keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478579007029/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478579007029/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478579007029/Put/vlen=8/seqid=0}
> {noformat}
> *TestRunner MetaScanner complains about invalid entries in META missing HRegionInfo*
> {noformat}
> (standard input):9086:2016-11-08 05:04:17,230 WARN  [B.defaultRpcServer.handler=4,queue=1,port=16000]
hbase.MetaTableAccessor: No serialized HRegionInfo in keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
> {noformat}
> *ITBLL MapReduce tasks fail because part of the keyspace cannot be located:*
> {noformat}
> java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1293)
>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1185)
>         at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:410)
>         at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:359)
>         at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:238)
>         at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:154)
>         at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:121)
>         at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.persist(IntegrationTestBigLinkedList.java:486)
>         at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:431)
>         at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:375)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1719)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000008/syslog:920:java.io.IOException:
HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000010/syslog:920:java.io.IOException:
HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000011/syslog:909:java.io.IOException:
HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000030/syslog:48:java.io.IOException:
HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000048/syslog:48:java.io.IOException:
HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message