Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C52C3200C32 for ; Thu, 23 Feb 2017 00:02:55 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id C3F82160B62; Wed, 22 Feb 2017 23:02:55 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9149A160B7C for ; Thu, 23 Feb 2017 00:02:54 +0100 (CET) Received: (qmail 32311 invoked by uid 500); 22 Feb 2017 23:02:53 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 32299 invoked by uid 99); 22 Feb 2017 23:02:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Feb 2017 23:02:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 342B518E16A for ; Wed, 22 Feb 2017 23:02:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.949 X-Spam-Level: X-Spam-Status: No, score=-0.949 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, KAM_LOTSOFHASH=0.25, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id hx4BVKR5I3Zn for ; Wed, 22 Feb 2017 23:02:49 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 1E0F161C63 for ; Wed, 22 Feb 2017 23:02:48 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 0F345E0A30 for ; Wed, 22 Feb 2017 23:02:47 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id EE95E2414C for ; Wed, 22 Feb 2017 23:02:45 +0000 (UTC) Date: Wed, 22 Feb 2017 23:02:45 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 22 Feb 2017 23:02:55 -0000 [ https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879421#comment-15879421 ] stack commented on HBASE-17069: ------------------------------- bq. From what i understood we enter the below block only when the walEdit is not empty and we have some cells in the wal edit from the processed mutations. You are right. I was wrong (misreading). It was probably set to 'false' because we haven't added the append to memstore yet... that happens next... which seems reasonable. bq. In SequenceIdAccounting.update() we pass false for the multirequest (lets say sequence id here was 1) so lowestunflusedsequenceid is not updated. The edit is in WAL, perhaps unsynced, but not in memstore (yet); a limbo. Another limbo is the gap while we wait for an edit to be stamped with a sequenceid (HBASE-17471 and friends help here) bq. Now for the second put that goes through doMiniBatchMutation we pass true correctly during append(Seq id 2). lowestUnflushedSequenceIds is set to 2 for the metafamily. Issue is we have two ways of writing (smile) and they don't agree which is especially fun in times of high concurrency where behind the scenes we are trying to keep an incrementing counter that aligns with order in which edits arrive. Setting true as the patch does seems like a low-risk way of keeping stuff aligned in branch-1.2/1.3. We should do a follow-on for branch-1 to correct the fuzzyness around inmemory flag and ensure that the rare edit that is not meant for memstore doesn't cause an issue similar to here because it has us skip out on sequenceid accounting. Thanks [~abhishek.chouhan] > RegionServer writes invalid META entries for split daughters in some circumstances > ---------------------------------------------------------------------------------- > > Key: HBASE-17069 > URL: https://issues.apache.org/jira/browse/HBASE-17069 > Project: HBase > Issue Type: Bug > Components: wal > Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.4 > Reporter: Andrew Purtell > Assignee: Abhishek Singh Chouhan > Priority: Blocker > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5 > > Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, daughter_2_08629d59564726da2497f70451aafcdb.log, HBASE-17069.branch-1.3.001.patch, HBASE-17069.branch-1.3.002.patch, HBASE-17069.master.001.patch, logs.tar.gz, parent-393d2bfd8b1c52ce08540306659624f2.log > > > I have been seeing frequent ITBLL failures testing various versions of 1.2.x. > Over the lifetime of 1.2.x the following issues have been fixed: > - HBASE-15315 (Remove always set super user call as high priority) > - HBASE-16093 (Fix splits failed before creating daughter regions leave meta inconsistent) > And this one is pending: > - HBASE-17044 (Fix merge failed before creating merged region leaves meta inconsistent) > I can apply all of the above to branch-1.2 and still see this failure: > *The life of stillborn region d55ef81c2f8299abbddfce0445067830* > *Master sees SPLITTING_NEW* > {noformat} > 2016-11-08 04:23:21,186 INFO [AM.ZK.Worker-pool2-t82] master.RegionStates: Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, ts=1478579001186, server=node-3.cluster,16020,1478578389506} > {noformat} > *The RegionServer creates it* > {noformat} > 2016-11-08 04:23:26,035 INFO [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, currentSize=14996112, freeSize=12823716208, maxSize=12838712320, heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false > 2016-11-08 04:23:26,038 INFO [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created cacheConfig for big: blockCache=LruBlockCache{blockCount=34, currentSize=14996112, freeSize=12823716208, maxSize=12838712320, heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false > 2016-11-08 04:23:26,442 INFO [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, currentSize=17187656, freeSize=12821524664, maxSize=12838712320, heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false > 2016-11-08 04:23:26,713 INFO [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, currentSize=19178440, freeSize=12819533880, maxSize=12838712320, heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false > 2016-11-08 04:23:26,715 INFO [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, currentSize=19178440, freeSize=12819533880, maxSize=12838712320, heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false > 2016-11-08 04:23:26,717 INFO [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, currentSize=19178440, freeSize=12819533880, maxSize=12838712320, heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false > {noformat} > *The RegionServer onlines it* > {noformat} > 2016-11-08 04:23:27,015 INFO [node-3.cluster,16020,1478578389506-daughterOpener=d55ef81c2f8299abbddfce0445067830] regionserver.HRegion: Onlined d55ef81c2f8299abbddfce0445067830; next sequenceid=19184 > 2016-11-08 04:23:27,029 INFO [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099] regionserver.HRegionServer: Post open deploy tasks for IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830. > 2016-11-08 04:23:27,047 INFO [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099] hbase.MetaTableAccessor: Updated row IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830. with server=node-3.cluster,16020,1478578389506 > {noformat} > *The Master transitions state from SPLITTING_NEW to OPEN* > {noformat} > 2016-11-08 04:23:27,058 INFO [AM.ZK.Worker-pool2-t84] master.RegionStates: Transition {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, ts=1478579007057, server=node-3.cluster,16020,1478578389506} to {d55ef81c2f8299abbddfce0445067830 state=OPEN, ts=1478579007058, server=node-3.cluster,16020,1478578389506} > 2016-11-08 04:23:27,059 INFO [AM.ZK.Worker-pool2-t84] master.AssignmentManager: Handled SPLIT event; parent=IntegrationTestBigLinkedList,,1478577020916.393d2bfd8b1c52ce08540306659624f2., daughter a=IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830., daughter b=IntegrationTestBigLinkedList,/\xFB\x14,1478579001155.08629d59564726da2497f70451aafcdb., on node-3.cluster,16020,1478578389506 > {noformat} > *RegionServer updates META - BUT APPARENTLY NOT CORRECTLY* > {noformat} > 2016-11-08 04:23:27,165 INFO [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099] regionserver.SplitRequest: Region split, hbase:meta updated, and report to master. Parent=IntegrationTestBigLinkedList,,1478577020916.393d2bfd8b1c52ce08540306659624f2., new regions: IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830., IntegrationTestBigLinkedList,/\xFB\x14,1478579001155.08629d59564726da2497f70451aafcdb.. Split took 6sec > {noformat} > *RegionServer delays flush* > (Is this important?) > {noformat} > 2016-11-08 04:24:14,639 WARN [MemStoreFlusher.0] regionserver.MemStoreFlusher: Region IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830. has too many store files; delaying flush up to 90000ms > {noformat} > *Immediate warnings about No serialized HRegionInfo* > {noformat} > 2016-11-08 04:24:44,691 WARN [B.defaultRpcServer.handler=26,queue=2,port=16000] hbase.MetaTableAccessor: No serialized HRegionInfo in keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478579007029/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478579007029/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478579007029/Put/vlen=8/seqid=0} > {noformat} > *Master is not happy either* > {noformat} > 2016-11-08 04:24:51,148 WARN [MASTER_TABLE_OPERATIONS-node-1:16000-0] hbase.MetaTableAccessor: No serialized HRegionInfo in keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478579007029/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478579007029/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478579007029/Put/vlen=8/seqid=0} > {noformat} > *TestRunner MetaScanner complains about invalid entries in META missing HRegionInfo* > {noformat} > (standard input):9086:2016-11-08 05:04:17,230 WARN [B.defaultRpcServer.handler=4,queue=1,port=16000] hbase.MetaTableAccessor: No serialized HRegionInfo in keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0} > {noformat} > *ITBLL MapReduce tasks fail because part of the keyspace cannot be located:* > {noformat} > java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0} > at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1293) > at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1185) > at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:410) > at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:359) > at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:238) > at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:154) > at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:121) > at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.persist(IntegrationTestBigLinkedList.java:486) > at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:431) > at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:375) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1719) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164) > {noformat} > {noformat} > ./application_1478574724776_0002/container_1478574724776_0002_01_000008/syslog:920:java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0} > {noformat} > {noformat} > ./application_1478574724776_0002/container_1478574724776_0002_01_000010/syslog:920:java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0} > {noformat} > {noformat} > ./application_1478574724776_0002/container_1478574724776_0002_01_000011/syslog:909:java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0} > {noformat} > {noformat} > ./application_1478574724776_0002/container_1478574724776_0002_01_000030/syslog:48:java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0} > {noformat} > {noformat} > ./application_1478574724776_0002/container_1478574724776_0002_01_000048/syslog:48:java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0} > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)