hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abhishek Singh Chouhan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances
Date Wed, 22 Feb 2017 17:07:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878739#comment-15878739
] 

Abhishek Singh Chouhan commented on HBASE-17069:
------------------------------------------------

bq. What are these: "The first request with daughter info is a multi (with hregion info) while
the second one is a put." These are writes of the split info? What does the second put do?
Just trying to follow along.
Yep these are writes of the split info. The second put is when the region actually opens on
the rs, during postopen it sends a put that updates the location, this doesn't have hregioninfo.

bq. Why are mutations empty, if (!walEdit.isEmpty()) {, such that we go the route of '6. Append
no sync'? (An append with no data shouldn't go in to memstore so I suppose false is correct
in this case?)
The mutations are not empty, however we still were going the route of appending with inMemstore
as false.

Thanks [~stack] :)

> RegionServer writes invalid META entries for split daughters in some circumstances
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-17069
>                 URL: https://issues.apache.org/jira/browse/HBASE-17069
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.4
>            Reporter: Andrew Purtell
>            Assignee: Abhishek Singh Chouhan
>            Priority: Blocker
>             Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5
>
>         Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, daughter_2_08629d59564726da2497f70451aafcdb.log,
HBASE-17069.branch-1.3.001.patch, HBASE-17069.branch-1.3.002.patch, HBASE-17069.master.001.patch,
logs.tar.gz, parent-393d2bfd8b1c52ce08540306659624f2.log
>
>
> I have been seeing frequent ITBLL failures testing various versions of 1.2.x. 
> Over the lifetime of 1.2.x the following issues have been fixed:
> - HBASE-15315 (Remove always set super user call as high priority)
> - HBASE-16093 (Fix splits failed before creating daughter regions leave meta inconsistent)
> And this one is pending:
> - HBASE-17044 (Fix merge failed before creating merged region leaves meta inconsistent)
> I can apply all of the above to branch-1.2 and still see this failure: 
> *The life of stillborn region d55ef81c2f8299abbddfce0445067830*
> *Master sees SPLITTING_NEW*
> {noformat}
> 2016-11-08 04:23:21,186 INFO  [AM.ZK.Worker-pool2-t82] master.RegionStates: Transition
null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, ts=1478579001186, server=node-3.cluster,16020,1478578389506}
> {noformat}
> *The RegionServer creates it*
> {noformat}
> 2016-11-08 04:23:26,035 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig:
Created cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, currentSize=14996112,
freeSize=12823716208, maxSize=12838712320, heapSize=14996112, minSize=12196776960, minFactor=0.95,
multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true,
cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false,
cacheDataCompressed=false, prefetchOnOpen=false
> 2016-11-08 04:23:26,038 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig:
Created cacheConfig for big: blockCache=LruBlockCache{blockCount=34, currentSize=14996112,
freeSize=12823716208, maxSize=12838712320, heapSize=14996112, minSize=12196776960, minFactor=0.95,
multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true,
cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false,
cacheDataCompressed=false, prefetchOnOpen=false
> 2016-11-08 04:23:26,442 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig:
Created cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, currentSize=17187656,
freeSize=12821524664, maxSize=12838712320, heapSize=17187656, minSize=12196776960, minFactor=0.95,
multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true,
cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false,
cacheDataCompressed=false, prefetchOnOpen=false
> 2016-11-08 04:23:26,713 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig:
Created cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, currentSize=19178440,
freeSize=12819533880, maxSize=12838712320, heapSize=19178440, minSize=12196776960, minFactor=0.95,
multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true,
cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false,
cacheDataCompressed=false, prefetchOnOpen=false
> 2016-11-08 04:23:26,715 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig:
Created cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, currentSize=19178440,
freeSize=12819533880, maxSize=12838712320, heapSize=19178440, minSize=12196776960, minFactor=0.95,
multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true,
cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false,
cacheDataCompressed=false, prefetchOnOpen=false
> 2016-11-08 04:23:26,717 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig:
Created cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, currentSize=19178440,
freeSize=12819533880, maxSize=12838712320, heapSize=19178440, minSize=12196776960, minFactor=0.95,
multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true,
cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false,
cacheDataCompressed=false, prefetchOnOpen=false
> {noformat}
> *The RegionServer onlines it*
> {noformat}
> 2016-11-08 04:23:27,015 INFO  [node-3.cluster,16020,1478578389506-daughterOpener=d55ef81c2f8299abbddfce0445067830]
regionserver.HRegion: Onlined d55ef81c2f8299abbddfce0445067830; next sequenceid=19184
> 2016-11-08 04:23:27,029 INFO  [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099]
regionserver.HRegionServer: Post open deploy tasks for IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.
> 2016-11-08 04:23:27,047 INFO  [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099]
hbase.MetaTableAccessor: Updated row IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.
with server=node-3.cluster,16020,1478578389506
> {noformat}
> *The Master transitions state from SPLITTING_NEW to OPEN*
> {noformat}
> 2016-11-08 04:23:27,058 INFO  [AM.ZK.Worker-pool2-t84] master.RegionStates: Transition
{d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, ts=1478579007057, server=node-3.cluster,16020,1478578389506}
to {d55ef81c2f8299abbddfce0445067830 state=OPEN, ts=1478579007058, server=node-3.cluster,16020,1478578389506}
> 2016-11-08 04:23:27,059 INFO  [AM.ZK.Worker-pool2-t84] master.AssignmentManager: Handled
SPLIT event; parent=IntegrationTestBigLinkedList,,1478577020916.393d2bfd8b1c52ce08540306659624f2.,
daughter a=IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.,
daughter b=IntegrationTestBigLinkedList,/\xFB\x14,1478579001155.08629d59564726da2497f70451aafcdb.,
on node-3.cluster,16020,1478578389506
> {noformat}
> *RegionServer updates META  - BUT APPARENTLY NOT CORRECTLY*
> {noformat}
> 2016-11-08 04:23:27,165 INFO  [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099]
regionserver.SplitRequest: Region split, hbase:meta updated, and report to master. Parent=IntegrationTestBigLinkedList,,1478577020916.393d2bfd8b1c52ce08540306659624f2.,
new regions: IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.,
IntegrationTestBigLinkedList,/\xFB\x14,1478579001155.08629d59564726da2497f70451aafcdb.. Split
took 6sec
> {noformat}
> *RegionServer delays flush*
> (Is this important?)
> {noformat}
> 2016-11-08 04:24:14,639 WARN  [MemStoreFlusher.0] regionserver.MemStoreFlusher: Region
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830. has too many
store files; delaying flush up to 90000ms
> {noformat}
> *Immediate warnings about No serialized HRegionInfo*
> {noformat}
> 2016-11-08 04:24:44,691 WARN  [B.defaultRpcServer.handler=26,queue=2,port=16000] hbase.MetaTableAccessor:
No serialized HRegionInfo in keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478579007029/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478579007029/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478579007029/Put/vlen=8/seqid=0}
> {noformat}
> *Master is not happy either*
> {noformat}
> 2016-11-08 04:24:51,148 WARN  [MASTER_TABLE_OPERATIONS-node-1:16000-0] hbase.MetaTableAccessor:
No serialized HRegionInfo in keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478579007029/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478579007029/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478579007029/Put/vlen=8/seqid=0}
> {noformat}
> *TestRunner MetaScanner complains about invalid entries in META missing HRegionInfo*
> {noformat}
> (standard input):9086:2016-11-08 05:04:17,230 WARN  [B.defaultRpcServer.handler=4,queue=1,port=16000]
hbase.MetaTableAccessor: No serialized HRegionInfo in keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
> {noformat}
> *ITBLL MapReduce tasks fail because part of the keyspace cannot be located:*
> {noformat}
> java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1293)
>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1185)
>         at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:410)
>         at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:359)
>         at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:238)
>         at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:154)
>         at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:121)
>         at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.persist(IntegrationTestBigLinkedList.java:486)
>         at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:431)
>         at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:375)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1719)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000008/syslog:920:java.io.IOException:
HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000010/syslog:920:java.io.IOException:
HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000011/syslog:909:java.io.IOException:
HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000030/syslog:48:java.io.IOException:
HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000048/syslog:48:java.io.IOException:
HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message