hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dere (JIRA)" <>
Subject [jira] [Updated] (HIVE-17963) Fix for HIVE-17113 can be improved for non-blobstore filesystems
Date Fri, 03 Nov 2017 19:32:00 GMT


Jason Dere updated HIVE-17963:
    Attachment: HIVE-17963.2.patch

> Fix for HIVE-17113 can be improved for non-blobstore filesystems
> ----------------------------------------------------------------
>                 Key: HIVE-17963
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>            Priority: Major
>         Attachments: HIVE-17963.1.patch, HIVE-17963.2.patch
> HIVE-17113/HIVE-17813 fix the duplicate file issue by performing file moves on a file-by-file
basis. For non-blobstore filesystems this results in many more filesystem/namenode operations
compared to the previous Utilities.mvFileToFinalPath() behavior (dedup files in src dir, rename
src dir to final dir).
> For non-blobstore filesystems, a better solution would be the one described [here|]:
> 1) Move the temp directory to a new directory name, to prevent additional files from
being added by any runaway processes.
> 2) Run removeTempOrDuplicateFiles() on this renamed temp directory
> 3) Run renameOrMoveFiles() to move the renamed temp directory to the final location.
> This results in only one additional file operation in non-blobstore FSes compared to
the original Utilities.mvFileToFinalPath() behavior.
> The proposal is to do away with the config setting hive.exec.move.files.from.source.dir
and always have behavior that should take care of the duplicate file issue described in HIVE-17113.
For non-blobstore filesystems we will do steps 1-3 described above. For blobstore filesystems
we will do the solution done in HIVE-17113/HIVE-17813 which does the file-by-file copy - this
should have the same number of file operations as doing a rename directory on blobstore, which
effectively results in file moves on a file-by-file basis.

This message was sent by Atlassian JIRA

View raw message