hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HIVE-14980) Minor compaction when triggered simultaniously on the same table/partition deletes data
Date Mon, 02 Oct 2017 19:22:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-14980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eugene Koifman resolved HIVE-14980.
-----------------------------------
    Resolution: Not A Bug

fixed in HIVE-15202

> Minor compaction when triggered simultaniously on the same table/partition deletes data
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-14980
>                 URL: https://issues.apache.org/jira/browse/HIVE-14980
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore, Transactions
>    Affects Versions: 2.1.0
>            Reporter: Mahipal Jupalli
>            Assignee: Mahipal Jupalli
>            Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> I have two tables (TABLEA, TABLEB). If I manually trigger compaction after each INSERT
into TABLEB from TABLEA, compactions are triggered on random metastore asynchronously and
are stepping on each other which is causing the data to be deleted.
> Example here: 
> TABLEA - has 10k rows. 
> insert into mj.tableb select * from mj.tablea;
> alter table mj.tableb compact 'MINOR';
> insert into mj.tableb select * from mj.tablea;
> alter table mj.tableb compact 'MINOR';
> Once all the compactions are complete, I should ideally see 20k rows in TABLEB. But I
see only 10k rows (Only the rows INSERTED before the last compaction persist, the old rows
are deleted. I believe the old delta files are deleted). 
> To further confirm the bug, if I do only one compaction after two inserts, I see 20k
rows in TABLEB.
> Proposed Fix:
> I have identified the bug in the code, it requires an additional check in the org.apache.hadoop.hive.ql.txn.compactor.Worker
class to check for any active compactions on the table/partition. I will 'share the details
of the fix once I test it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message