hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jihong Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
Date Mon, 05 Jan 2015 04:44:34 GMT

    [ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264177#comment-14264177
] 

Jihong Liu commented on HIVE-8966:
----------------------------------

Did a test. Generally the new version works as expected. But for the following case, the compaction
will always fail:

1. due to any reason, the writer exits without closing a batch. So the "length" file is still
there. This could happen, for example the program is killed, hive/server restarts.
2. restart the program, so a new writer and a new batch is created and continute to write
into the same partition. The data will go to a new delta.
3. Now we manually delete that "length" file in the previous delta. Then do compaction, but
it fails. Even we totally exit the program so that no any open batch and no any "length" file,
the compaction will never success for this partition. 

However the current hive 14.0 will work fine for the above case.

> Delta files created by hive hcatalog streaming cannot be compacted
> ------------------------------------------------------------------
>
>                 Key: HIVE-8966
>                 URL: https://issues.apache.org/jira/browse/HIVE-8966
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 0.14.0
>         Environment: hive
>            Reporter: Jihong Liu
>            Assignee: Alan Gates
>            Priority: Critical
>             Fix For: 0.14.1
>
>         Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.patch
>
>
> hive hcatalog streaming will also create a file like bucket_n_flush_length in each delta
directory. Where "n" is the bucket number. But the compactor.CompactorMR think this file also
needs to compact. However this file of course cannot be compacted, so compactor.CompactorMR
will not continue to do the compaction. 
> Did a test, after removed the bucket_n_flush_length file, then the "alter table partition
compact" finished successfully. If don't delete that file, nothing will be compacted. 
> This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message