hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request
Date Wed, 25 May 2016 22:33:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301014#comment-15301014
] 

Eugene Koifman commented on HIVE-13354:
---------------------------------------

Couple of nits
1. it seem like 'compactor.mapreduce.map.memory.mb' in
{quote}
872	    executeStatementOnDriver("CREATE TABLE " + tblName2 + "(a INT, b STRING) " +
873	        " CLUSTERED BY(a) INTO 2 BUCKETS STORED AS ORC TBLPROPERTIES (" +
874	        "'transactional'='true'," +
875	        "'compactor.mapreduce.map.memory.mb'='2048'," + // 2048 MB memory for compaction
map job
876	        "'compactorthreshold.hive.compactor.delta.num.threshold'='4'," +  // minor compaction
if more than 4 delta dirs
877	        "'compactorthreshold.hive.compactor.delta.pct.threshold'='0.5'" + // major compaction
if more than 50%
878	        ")", driver);
{quote}
is never tested.  Is it possible?

2. perhaps props like "compactor." should have symbolic constants if not Enums somewhere

otherwise looks good

> Add ability to specify Compaction options per table and per request
> -------------------------------------------------------------------
>
>                 Key: HIVE-13354
>                 URL: https://issues.apache.org/jira/browse/HIVE-13354
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 1.3.0, 2.0.0
>            Reporter: Eugene Koifman
>            Assignee: Wei Zheng
>              Labels: TODOC2.1
>         Attachments: HIVE-13354.1.patch, HIVE-13354.1.withoutSchemaChange.patch, HIVE-13354.2.patch
>
>
> Currently the are a few options that determine when automatic compaction is triggered.
 They are specified once for the warehouse.
> This doesn't make sense - some table may be more important and need to be compacted more
often.
> We should allow specifying these on per table basis.
> Also, compaction is an MR job launched from within the metastore.  There is currently
no way to control job parameters (like memory, for example) except to specify it in hive-site.xml
for metastore which means they are site wide.
> Should add a way to specify these per table (perhaps even per compaction if launched
via ALTER TABLE)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message