hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16143) Improve msck repair batching
Date Wed, 03 May 2017 06:14:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994321#comment-15994321
] 

Hive QA commented on HIVE-16143:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12866074/HIVE-16143.06.patch

{color:green}SUCCESS:{color} +1 due to 9 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10650 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[table_nonprintable] (batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
(batchId=155)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5013/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5013/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5013/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12866074 - PreCommit-HIVE-Build

> Improve msck repair batching
> ----------------------------
>
>                 Key: HIVE-16143
>                 URL: https://issues.apache.org/jira/browse/HIVE-16143
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>         Attachments: HIVE-16143.01.patch, HIVE-16143.02.patch, HIVE-16143.03.patch, HIVE-16143.04.patch,
HIVE-16143.05.patch, HIVE-16143.06.patch
>
>
> Currently, the {{msck repair table}} command batches the number of partitions created
in the metastore using the config {{HIVE_MSCK_REPAIR_BATCH_SIZE}}. Following snippet shows
the batching logic. There can be couple of improvements to this batching logic:
> {noformat} 
> int batch_size = conf.getIntVar(ConfVars.HIVE_MSCK_REPAIR_BATCH_SIZE);
>           if (batch_size > 0 && partsNotInMs.size() > batch_size) {
>             int counter = 0;
>             for (CheckResult.PartitionResult part : partsNotInMs) {
>               counter++;
>               apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
>               repairOutput.add("Repair: Added partition to metastore " + msckDesc.getTableName()
>                   + ':' + part.getPartitionName());
>               if (counter % batch_size == 0 || counter == partsNotInMs.size()) {
>                 db.createPartitions(apd);
>                 apd = new AddPartitionDesc(table.getDbName(), table.getTableName(), false);
>               }
>             }
>           } else {
>             for (CheckResult.PartitionResult part : partsNotInMs) {
>               apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
>               repairOutput.add("Repair: Added partition to metastore " + msckDesc.getTableName()
>                   + ':' + part.getPartitionName());
>             }
>             db.createPartitions(apd);
>           }
>         } catch (Exception e) {
>           LOG.info("Could not bulk-add partitions to metastore; trying one by one", e);
>           repairOutput.clear();
>           msckAddPartitionsOneByOne(db, table, partsNotInMs, repairOutput);
>         }
> {noformat}
> 1. If the batch size is too aggressive the code falls back to adding partitions one by
one which is almost always very slow. It is easily possible that users increase the batch
size to higher value to make the command run faster but end up with a worse performance because
code falls back to adding one by one. Users are then expected to determine the tuned value
of batch size which works well for their environment. I think the code could handle this situation
better by exponentially decaying the batch size instead of falling back to one by one.
> 2. The other issue with this implementation is if lets say first batch succeeds and the
second one fails, the code tries to add all the partitions one by one irrespective of whether
some of the were successfully added or not. If we need to fall back to one by one we should
atleast remove the ones which we know for sure are already added successfully.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message