Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id D629A200D78 for ; Thu, 7 Dec 2017 02:59:06 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id D4A74160C0A; Thu, 7 Dec 2017 01:59:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 00880160C1D for ; Thu, 7 Dec 2017 02:59:05 +0100 (CET) Received: (qmail 89483 invoked by uid 500); 7 Dec 2017 01:59:04 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 89470 invoked by uid 99); 7 Dec 2017 01:59:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Dec 2017 01:59:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id EFD36C61B3 for ; Thu, 7 Dec 2017 01:59:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.011 X-Spam-Level: X-Spam-Status: No, score=-100.011 tagged_above=-999 required=6.31 tests=[SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id QyjkPSEKnrrG for ; Thu, 7 Dec 2017 01:59:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 1AB2E5F2C2 for ; Thu, 7 Dec 2017 01:59:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B7C75E0EF9 for ; Thu, 7 Dec 2017 01:59:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 25EBF255CB for ; Thu, 7 Dec 2017 01:59:00 +0000 (UTC) Date: Thu, 7 Dec 2017 01:59:00 +0000 (UTC) From: "James Taylor (JIRA)" To: dev@phoenix.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (PHOENIX-4130) Avoid server retries for mutable indexes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 07 Dec 2017 01:59:07 -0000 [ https://issues.apache.org/jira/browse/PHOENIX-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281234#comment-16281234 ] James Taylor commented on PHOENIX-4130: --------------------------------------- Thanks for the patch, [~vincentpoon]. I like this approach. A few comments/questions: - Once the index is put in a PENDING_ACTIVE state, who puts it back into an ACTIVE state if the client retries succeed? Are you counting on the partial index rebuilder in MetaDataRegionObserver to do that? The downside of that is that we'd replay some edits that have already been processed, but if we expect this to be rare perhaps that's ok. An alternative would be for Phoenix to handle the client retries and set the index status back to ACTIVE if they succeed. I'm not positive, but we may be able to detect from the client that retries were done and then we could potentially set the index back to ACTIVE and clear the INDEX_DISABLE_TIMESTAMP (FYI, see below for a util function that does this). - One good side effect of putting the index to a PENDING_ACTIVE state is that if the client were to crash during the retries, the partial index rebuilder would do the replay and eventual disable if the replay took too long. If this were to happen, though, do you think the lag time is too long to continue to use the index even though it's out of sync with the client (up to 45 minutes while previously it would have been ~15 seconds)? Another (though more complex) alternative would be to invent another state like PENDING_DISABLE which would act the same as PENDING_ACTIVE except on the client we could treat the index as disabled if the index stays in this state longer than a configurable amount of time (i.e. the same amount of time we were willing to retry on the server side). This would be a much shorter window in this case than the default 45 minutes of keeping the index active until the partial index rebuilder gives up and marks it as permanently disabled. - One corner case that's not handled is if we're updating indexes within the RS. This can occur as an optimization when auto commit is on and a DML statement is operating on a single table (i.e. DELETE FROM T). In this case, the commits occur in UngroupedAggregateRegionObserver.commit(), unfortunately outside of the abstraction we have in MutationState. - We'll want to document that the QueryServices.INDEX_FAILURE_DISABLE_INDEX config property needs to be set both on the client and server side as we're relying on that to determine both the client behaviour and the server behaviour (where before it was only used on the server side). Not a huge deal, but if we can prevent needing to check it client side (maybe based on information thrown back in the exception?), then that'd be an improvement IMHO. - You might want to consider using the IndexUtil.updateIndexState(conn, fullIndexTableName, PIndexState.DISABLE, 0L) method to disable the index instead of using the one you wrote. If you use yours, then you'd want to make sure to put the schema name, table name, and index name in double quotes in case they're case sensitive (as otherwise you'd get a TableNotFoundException). {code} + private void disableIndex(String dataTableFullName, String indexName) throws SQLException { + logger.info( + "Disabling index after hitting max number of index write retries: " + indexName); + String disableIndexDDL = "ALTER INDEX %s ON %s DISABLE"; + connection.createStatement().execute(String.format(disableIndexDDL, + SchemaUtil.getTableNameFromFullName(indexName), dataTableFullName)); + } {code} > Avoid server retries for mutable indexes > ---------------------------------------- > > Key: PHOENIX-4130 > URL: https://issues.apache.org/jira/browse/PHOENIX-4130 > Project: Phoenix > Issue Type: Improvement > Reporter: Lars Hofhansl > Assignee: Vincent Poon > Fix For: 4.14.0 > > Attachments: PHOENIX-4130.v1.master.patch > > > Had some discussions with [~jamestaylor], [~samarthjain], and [~vincentpoon], during which I suggested that we can possibly eliminate retry loops happening at the server that cause the handler threads to be stuck potentially for quite a while (at least multiple seconds to ride over common scenarios like splits). > Instead we can do the retries at the Phoenix client that. > So: > # The index updates are not retried on the server. (retries = 0) > # A failed index update would set the failed index timestamp but leave the index enabled. > # Now the handler thread is done, it throws an appropriate exception back to the client. > # The Phoenix client can now retry. When those retries fail the index is disabled (if the policy dictates that) and throw the exception back to its caller. > So no more waiting is needed on the server, handler threads are freed immediately. -- This message was sent by Atlassian JIRA (v6.4.14#64029)