hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dere (JIRA)" <>
Subject [jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task
Date Mon, 17 Jul 2017 23:53:00 GMT


Jason Dere updated HIVE-17113:
    Attachment: HIVE-17113.1.patch

Patch to switch the order of file operations during Utilities.mvFileToFinalPath() - move the
temp directory to the final location first, then remove duplicate bucket files.
[~ashutoshc] can you take a look?
[~rajesh.balamohan] FYI this may undo some of the file operation optimization you did in HIVE-14323.

> Duplicate bucket files can get written to table by runaway task
> ---------------------------------------------------------------
>                 Key: HIVE-17113
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>         Attachments: HIVE-17113.1.patch
> Saw a table get a duplicate bucket file from a Hive query. It looks like the following
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is started
> 3. Task attempt A_1 finishes execution and saves its output to the temp directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls Utilities.removeTempOrDuplicateFiles()
to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp directory. At
this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the final location,
where it is later moved to the partition directory.

This message was sent by Atlassian JIRA

View raw message