hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request
Date Thu, 19 May 2016 23:18:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292324#comment-15292324
] 

Eugene Koifman commented on HIVE-13354:
---------------------------------------

{quote} // intentionally set this high so that ttp1 will not trigger major compaction later
on
       conf.setFloatVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD, 0.8f);
{quote}
could this be moved to where it's used - it's confusing at its current location

{quote}
           runWorker(conf);  // compact ttp2
	    runWorker(conf);  // compact ttp1
	    runCleaner(conf);
	    rsp = txnHandler.showCompact(new ShowCompactRequest());
	    Assert.assertEquals(2, rsp.getCompacts().size());
	    Assert.assertEquals("ttp2", rsp.getCompacts().get(0).getTablename());
	    Assert.assertEquals("ready for cleaning", rsp.getCompacts().get(0).getState());
	    Assert.assertEquals("ttp1", rsp.getCompacts().get(1).getTablename());
	    Assert.assertEquals("ready for cleaning", rsp.getCompacts().get(1).getState());
{quote}
The "ready for cleaning" seems suspicious after successful runCleaner()...  Also, perhaps
TxnStrore.CLEANING_RESPONSE would be better

{quote}
           // ttp1 has 0.8 for DELTA_PCT_THRESHOLD (from hive conf), whereas ttp2 has 0.5
(from tblproperties)
	    // so only ttp2 will trigger major compaction for the newly inserted row (actual pct:
0.66)
{quote}
this seems wrong.    ttp2 had 5 rows which were Major compacted into a base.  Now 2 more rows
are added.  2/5 = 40%
Perhaps compaction is triggered because in this case ORC headers make up 99% of the file size.

bq. 949	    Assert.assertEquals("ready for cleaning", rsp.getCompacts().get(2).getState());
I would've expected this state to be TxnStore.SUCCEEDED_RESPONSE after runCleaner().  Why
isn't it?

bq. 973	    Assert.assertTrue(job.get("hive.compactor.table.props").contains("orc.compress.size4:8192"));
Why "size4"?

{quote}
void compact(String dbname, String tableName, String partitionName, CompactionType type,
1440	               Map<String, String> tblproperties) throws TException;
1440	
{quote}
This is public API change so should probably deprecate the method with old signature

{quote}
348 pStmt = dbConn.prepareStatement("insert into COMPLETED_COMPACTIONS(CC_ID, CC_DATABASE,
CC_TABLE, CC_PARTITION, CC_STATE, CC_TYPE, CC_TBLPROPERTIES, CC_WORKER_ID, CC_START, CC_END,
CC_RUN_AS, CC_HIGHEST_TXN_ID, CC_META_INFO, CC_HADOOP_JOB_ID) VALUES(?,?,?,?,?, ?,?,?,?,?,
?,?,?)");
{quote}
A new column is added here but the number of "?" is the same.  How does this work?

{quote}
714	        rs = stmt.executeQuery("select cc_id, cc_database, cc_table, cc_partition, cc_state,
" +
715	            "cc_tblproperties from COMPLETED_COMPACTIONS order by cc_database, cc_table,
" +
716	            "cc_partition, cc_id desc");
{quote}
Why do you need to know cc_tblproperties in order to delete the entry from history?

etc


> Add ability to specify Compaction options per table and per request
> -------------------------------------------------------------------
>
>                 Key: HIVE-13354
>                 URL: https://issues.apache.org/jira/browse/HIVE-13354
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 1.3.0, 2.0.0
>            Reporter: Eugene Koifman
>            Assignee: Wei Zheng
>              Labels: TODOC2.1
>         Attachments: HIVE-13354.1.patch, HIVE-13354.1.withoutSchemaChange.patch
>
>
> Currently the are a few options that determine when automatic compaction is triggered.
 They are specified once for the warehouse.
> This doesn't make sense - some table may be more important and need to be compacted more
often.
> We should allow specifying these on per table basis.
> Also, compaction is an MR job launched from within the metastore.  There is currently
no way to control job parameters (like memory, for example) except to specify it in hive-site.xml
for metastore which means they are site wide.
> Should add a way to specify these per table (perhaps even per compaction if launched
via ALTER TABLE)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message