Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 09F4F200CA3 for ; Wed, 3 May 2017 08:14:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 088CE160BBE; Wed, 3 May 2017 06:14:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 293A6160BAC for ; Wed, 3 May 2017 08:14:09 +0200 (CEST) Received: (qmail 26886 invoked by uid 500); 3 May 2017 06:14:08 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 26877 invoked by uid 99); 3 May 2017 06:14:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 May 2017 06:14:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id E6A98C05B0 for ; Wed, 3 May 2017 06:14:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -98 X-Spam-Level: X-Spam-Status: No, score=-98 tagged_above=-999 required=6.31 tests=[KAM_BADIPHTTP=2, NORMAL_HTTP_TO_IP=0.001, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id wEr8cymreRm8 for ; Wed, 3 May 2017 06:14:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id EFAC95FB71 for ; Wed, 3 May 2017 06:14:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 1EBE4E0D4E for ; Wed, 3 May 2017 06:14:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 47C3221DEB for ; Wed, 3 May 2017 06:14:04 +0000 (UTC) Date: Wed, 3 May 2017 06:14:04 +0000 (UTC) From: "Hive QA (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-16143) Improve msck repair batching MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 03 May 2017 06:14:10 -0000 [ https://issues.apache.org/jira/browse/HIVE-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994321#comment-15994321 ] Hive QA commented on HIVE-16143: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12866074/HIVE-16143.06.patch {color:green}SUCCESS:{color} +1 due to 9 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10650 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[table_nonprintable] (batchId=139) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=155) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5013/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5013/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5013/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12866074 - PreCommit-HIVE-Build > Improve msck repair batching > ---------------------------- > > Key: HIVE-16143 > URL: https://issues.apache.org/jira/browse/HIVE-16143 > Project: Hive > Issue Type: Improvement > Reporter: Vihang Karajgaonkar > Assignee: Vihang Karajgaonkar > Attachments: HIVE-16143.01.patch, HIVE-16143.02.patch, HIVE-16143.03.patch, HIVE-16143.04.patch, HIVE-16143.05.patch, HIVE-16143.06.patch > > > Currently, the {{msck repair table}} command batches the number of partitions created in the metastore using the config {{HIVE_MSCK_REPAIR_BATCH_SIZE}}. Following snippet shows the batching logic. There can be couple of improvements to this batching logic: > {noformat} > int batch_size = conf.getIntVar(ConfVars.HIVE_MSCK_REPAIR_BATCH_SIZE); > if (batch_size > 0 && partsNotInMs.size() > batch_size) { > int counter = 0; > for (CheckResult.PartitionResult part : partsNotInMs) { > counter++; > apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null); > repairOutput.add("Repair: Added partition to metastore " + msckDesc.getTableName() > + ':' + part.getPartitionName()); > if (counter % batch_size == 0 || counter == partsNotInMs.size()) { > db.createPartitions(apd); > apd = new AddPartitionDesc(table.getDbName(), table.getTableName(), false); > } > } > } else { > for (CheckResult.PartitionResult part : partsNotInMs) { > apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null); > repairOutput.add("Repair: Added partition to metastore " + msckDesc.getTableName() > + ':' + part.getPartitionName()); > } > db.createPartitions(apd); > } > } catch (Exception e) { > LOG.info("Could not bulk-add partitions to metastore; trying one by one", e); > repairOutput.clear(); > msckAddPartitionsOneByOne(db, table, partsNotInMs, repairOutput); > } > {noformat} > 1. If the batch size is too aggressive the code falls back to adding partitions one by one which is almost always very slow. It is easily possible that users increase the batch size to higher value to make the command run faster but end up with a worse performance because code falls back to adding one by one. Users are then expected to determine the tuned value of batch size which works well for their environment. I think the code could handle this situation better by exponentially decaying the batch size instead of falling back to one by one. > 2. The other issue with this implementation is if lets say first batch succeeds and the second one fails, the code tries to add all the partitions one by one irrespective of whether some of the were successfully added or not. If we need to fall back to one by one we should atleast remove the ones which we know for sure are already added successfully. -- This message was sent by Atlassian JIRA (v6.3.15#6346)