From issues-return-3982-archive-asf-public=cust-asf.ponee.io@phoenix.apache.org Fri Jan 11 19:50:16 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 0BC96180676 for ; Fri, 11 Jan 2019 19:50:15 +0100 (CET) Received: (qmail 17325 invoked by uid 500); 11 Jan 2019 18:50:15 -0000 Mailing-List: contact issues-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list issues@phoenix.apache.org Received: (qmail 17304 invoked by uid 99); 11 Jan 2019 18:50:15 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jan 2019 18:50:15 +0000 From: GitBox To: issues@phoenix.apache.org Subject: =?utf-8?q?=5BGitHub=5D_karanmehta93_commented_on_issue_=23419=3A_PHOENIX-?= =?utf-8?q?4009_Run_UPDATE_STATISTICS_command_by_using_MR_integration_on?= =?utf-8?b?4oCm?= Message-ID: <154723261462.11493.11923666253270925947.gitbox@gitbox.apache.org> Date: Fri, 11 Jan 2019 18:50:14 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit karanmehta93 commented on issue #419: PHOENIX-4009 Run UPDATE STATISTICS command by using MR integration on… URL: https://github.com/apache/phoenix/pull/419#issuecomment-453618632 > There are always some jobs (might be a few in a day) failing, because few mappers in a job continuously failing and even retries can't get over the issue which causes the whole job to fail -- this is the case I'm talking about, and it happens more frequently when some bad thing happen in the cluster. I understand your concern and I also agree that it can happen often. As you already pointed out, the simplest way to combat that is to retry the whole job again (or at certain intervals) and hope that it eventually succeeds. If not, we can raise appropriate alerts using monitoring infrastructure. > In this case, I want the retry job to skip the regions whose stats have already been updated and only do minimal work, so it wouldn't worsen the bad situation in the cluster and we can easily catch up to avoid missing SLA. As the current phase, I'm ok to proceed without this skip check. I understand the idea. Determining which regions data is missing from SYSTEM.STATS table is not possible (as part of this code) since the snapshot might have changed between the two jobs. A better way (in my understanding) of implementing this feature would be wrapper class for this tool which is aware about the job id and other details for the previous job. It can ensure that the job runs on the same snapshot everytime and mappers are only spawned accordingly (or even if mappers are launched, most of them are no-op). At this point, I feel that we should skip it, however feel free to add this as an potential enhancement to PHOENIX-5091. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services