hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-18709) Enable Compaction to work on more than one partition per job
Date Tue, 13 Feb 2018 21:36:00 GMT
Eugene Koifman created HIVE-18709:
-------------------------------------

             Summary: Enable Compaction to work on more than one partition per job
                 Key: HIVE-18709
                 URL: https://issues.apache.org/jira/browse/HIVE-18709
             Project: Hive
          Issue Type: Improvement
          Components: Transactions
    Affects Versions: 1.0.0
            Reporter: Eugene Koifman
            Assignee: Eugene Koifman


currently compaction launches 1 MR job per partition that needs to be compacted.
The number of tasks is equal to the number of buckets in the table (or number or writers in
the 'widest' write).
The number of AMs in a cluster is usually limited to a small percentage of the nodes.  This
limits how much compaction can be done in parallel.

Investigate what it would take for a single job to be able to handle multiple partitions.
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message