hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Peña (JIRA) <j...@apache.org>
Subject [jira] [Commented] (HIVE-16024) MSCK Repair Requires nonstrict hive.mapred.mode
Date Thu, 02 Mar 2017 22:54:45 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893193#comment-15893193
] 

Sergio Peña commented on HIVE-16024:
------------------------------------

[~zsombor.klara]

I took a look at the code again, and I think there might be a OOM problem even if we fetch
all partitions in batches (using PartitionIterable) when the strict mode is used. Here's the
piece of code:

{noformat}
void checkTable(Table table, PartitionIterable parts,
      boolean findUnknownPartitions, CheckResult result) throws IOException,
      HiveException {
...
Set<Path> partPaths = new HashSet<Path>();
...
for (Partition partition : parts) {
...
     if (!fs.exists(partPath)) {
        PartitionResult pr = new PartitionResult();
        pr.setPartitionName(partition.getName());
        pr.setTableName(partition.getTable().getTableName());
        result.getPartitionsNotOnFs().add(pr);
      }

      for (int i = 0; i < partition.getSpec().size(); i++) {
        partPaths.add(partPath.makeQualified(fs));
        partPath = partPath.getParent();
      }
}
...
{noformat}

My concern is that when running MSCK with million of partitions (fetched in batches), and
none of the partitions exist on the filesystem, then the above code will add each partition
name on the CheckResult object and partition locations on the partPaths temporary. There's
no statistics, but still a concern about OOM. Should we refactor that code instead for handling
partitions in batches on MSCK better?

> MSCK Repair Requires nonstrict hive.mapred.mode
> -----------------------------------------------
>
>                 Key: HIVE-16024
>                 URL: https://issues.apache.org/jira/browse/HIVE-16024
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 2.2.0
>            Reporter: Barna Zsombor Klara
>            Assignee: Barna Zsombor Klara
>         Attachments: HIVE-16024.01.patch, HIVE-16024.02.patch, HIVE-16024.03.patch, HIVE-16024.04.patch
>
>
> MSCK repair fails when hive.mapred.mode is set to strict
> HIVE-13788 modified the way we read up partitions for a table to improve performance.
Unfortunately it is using PartitionPruner to load the partitions which in turn is checking
hive.mapred.mode.
> The previous code did not check hive.mapred.mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message