hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions
Date Sat, 18 Sep 2010 00:55:33 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910880#action_12910880

Ning Zhang commented on HIVE-1655:

Actually the _tmp files are taken care of by FSPaths.commit() called at FileSinkOperator.close()
and any missed _tmp* files are removed in jobClose() -> Utilities.removeTempOrDuplicateFiles().
The only missing piece is the remove the empty directories at jobClose().

> Adding consistency check at jobClose() when committing dynamic partitions
> -------------------------------------------------------------------------
>                 Key: HIVE-1655
>                 URL: https://issues.apache.org/jira/browse/HIVE-1655
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
> In case of dynamic partition insert, FileSinkOperator generated a directory for a new
partition and the files in the directory is named with '_tmp*'. When a task succeed, the file
is renamed to remove the "_tmp", which essentially implement the "commit" semantics. A lot
of exceptions could happen (process got killed, machine dies etc.) could left the _tmp files
exist in the DP directory. These _tmp files should be deleted ("rolled back") at successful
jobClose(). After the deletion, we should also delete any empty directories.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message