Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 90F3A200CB0 for ; Fri, 23 Jun 2017 15:15:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 8BF9D160BCA; Fri, 23 Jun 2017 13:15:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 85CB6160BF2 for ; Fri, 23 Jun 2017 15:15:06 +0200 (CEST) Received: (qmail 67455 invoked by uid 500); 23 Jun 2017 13:15:05 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 67444 invoked by uid 99); 23 Jun 2017 13:15:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Jun 2017 13:15:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 48538C0461 for ; Fri, 23 Jun 2017 13:15:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.211 X-Spam-Level: X-Spam-Status: No, score=-99.211 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id hALGRXCXye8V for ; Fri, 23 Jun 2017 13:15:04 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 2B64460CD9 for ; Fri, 23 Jun 2017 13:15:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B82C4E0DE0 for ; Fri, 23 Jun 2017 13:15:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 8C54A240C1 for ; Fri, 23 Jun 2017 13:15:00 +0000 (UTC) Date: Fri, 23 Jun 2017 13:15:00 +0000 (UTC) From: "Ankit Singhal (JIRA)" To: dev@phoenix.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (PHOENIX-3964) Index.preWALRestore should handle index write failure MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 23 Jun 2017 13:15:07 -0000 [ https://issues.apache.org/jira/browse/PHOENIX-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060859#comment-16060859 ] Ankit Singhal edited comment on PHOENIX-3964 at 6/23/17 1:14 PM: ----------------------------------------------------------------- bq. PHOENIX-3072 only handle the index regions which are on the same RegionServer, but the index regions may also on other RegionServers. IMO, index regions will come up on all regionserver before data region during the race. bq. in Indexer.postOpen, the IOException will be caught, so the data region can open successfully even if the index write is failed again No, it will kill the region server because of KillServerOnFailurePolicy. And, We will lose the cached edits and not sure we will get a chance to replay them again. {code}this.abortable.abort(msg, cause);{code} I'm now becoming sceptical about this change, cached WAL replay after postOpen could overwrite the new index writes, if there are new overlapped writes coming to the data table(which eventually also written to index table) because now the region is open and available.(May result in data loss) was (Author: ankit@apache.org): bq. PHOENIX-3072 only handle the index regions which are on the same RegionServer, but the index regions may also on other RegionServers. IMO, index regions will come up on all regionserver before data region during the race. bq. in Indexer.postOpen, the IOException will be caught, so the data region can open successfully even if the index write is failed again No, it will kill the region server because of KillServerOnFailurePolicy. And, We will lose the cached edits and not sure we will get a chance to replay them again. {code}this.abortable.abort(msg, cause);{code} I'm now becoming sceptical about this change, cached WAL replay after postOpen could overwrite the new writes, if there are new overlapped writes coming to the data table(which eventually also written to index table) because now the region is open and available.(May result in data loss) > Index.preWALRestore should handle index write failure > ----------------------------------------------------- > > Key: PHOENIX-3964 > URL: https://issues.apache.org/jira/browse/PHOENIX-3964 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.10.0 > Reporter: chenglei > Attachments: PHOENIX-3964_v1.patch > > > When I restarted my hbase cluster a certain time, I noticed some regions are in FAILED_OPEN state and the RegionServers have some error logs as following: > {code:java} > 2017-06-20 12:31:30,493 ERROR [RS_OPEN_REGION-rsync:60020-3] handler.OpenRegionHandler: Failed open of region=BIZARCH_NS_PRODUCT.BIZTRACER_SPAN,0100134e-7ddf-4d13-ab77-6f0d683e5fad_0,1487594358223.57a7be72f9beaeb4285529ac6236f4e5., starting to roll back the global memstore size. > org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException: Failed to write to multiple index tables > at org.apache.phoenix.hbase.index.write.recovery.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:221) > at org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:185) > at org.apache.phoenix.hbase.index.write.RecoveryIndexWriter.write(RecoveryIndexWriter.java:75) > at org.apache.phoenix.hbase.index.Indexer.preWALRestore(Indexer.java:554) > at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$58.call(RegionCoprocessorHost.java:1312) > at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1517) > at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1592) > at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1549) > at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preWALRestore(RegionCoprocessorHost.java:1308) > at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:3338) > at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3220) > at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:823) > at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:716) > at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:687) > at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4596) > at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4566) > at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4538) > at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4494) > at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4445) > at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:465) > at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:139) > at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > When I look the code of Index.preWALRestore method, I find RecoveryIndexWriter.write method is used to write the indexUpdates in following line 543: > > {code:java} > > 526 public void preWALRestore(ObserverContext env, HRegionInfo info, > 527 HLogKey logKey, WALEdit logEdit) throws IOException { > 528 if (this.disabled) { > 529 super.preWALRestore(env, info, logKey, logEdit); > 530 return; > 531 } > 532 // TODO check the regions in transition. If the server on which the region lives is this one, > 533 // then we should rety that write later in postOpen. > 534 // we might be able to get even smarter here and pre-split the edits that are server-local > 535 // into their own recovered.edits file. This then lets us do a straightforward recovery of each > 536 // region (and more efficiently as we aren't writing quite as hectically from this one place). > 537 > 538 /* > 539 * Basically, we let the index regions recover for a little while long before retrying in the > 540 * hopes they come up before the primary table finishes. > 541 */ > 542 Collection> indexUpdates = extractIndexUpdate(logEdit); > 543 recoveryWriter.write(indexUpdates, true); > 544 } > {code} > but the RecoveryIndexWriter.write method is as following, it directly throws Exception except non-existing tables, so RecoveryIndexWriter's failurePolicy(which is StoreFailuresInCachePolicy by default) even has no opportunity to be used, and it leads to Index.failedIndexEdits which is filled by the StoreFailuresInCachePolicy is always empty. > {code:java} > public void write(Collection> toWrite, boolean allowLocalUpdates) throws IOException { > try { > write(resolveTableReferences(toWrite), allowLocalUpdates); > } catch (MultiIndexWriteFailureException e) { > for (HTableInterfaceReference table : e.getFailedTables()) { > if (!admin.tableExists(table.getTableName())) { > LOG.warn("Failure due to non existing table: " + table.getTableName()); > nonExistingTablesList.add(table); > } else { > throw e; > } > } > } > } > {code} > So the Index.postOpen method seems useless, because the updates Multimap in following 500 line which is got from Index.failedIndexEdits is always empty. > {code:java} > 499 public void postOpen(final ObserverContext c) { > 500 Multimap updates = failedIndexEdits.getEdits(c.getEnvironment().getRegion()); > 501 > 502 if (this.disabled) { > 503 super.postOpen(c); > 504 return; > 505 } > 506 > 507 //if we have no pending edits to complete, then we are done > 508 if (updates == null || updates.size() == 0) { > 509 return; > 510 } > 511 > 512 LOG.info("Found some outstanding index updates that didn't succeed during" > 513 + " WAL replay - attempting to replay now."); > 514 > 515 // do the usual writer stuff, killing the server again, if we can't manage to make the index > 516 // writes succeed again > 517 try { > 518 writer.writeAndKillYourselfOnFailure(updates, true); > 519 } catch (IOException e) { > 520 LOG.error("During WAL replay of outstanding index updates, " > 521 + "Exception is thrown instead of killing server during index writing", e); > 522 } > 523 } > {code} > So I think in Index.preWALRestore method, we should use RecoveryWriter.writeAndKillYourselfOnFailure method to write the indexUpdates and handle index write failure, not just use the RecoveryIndexWriter.write method. > -- This message was sent by Atlassian JIRA (v6.4.14#64029)