hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dere (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17963) Fix for HIVE-17113 can be improved for non-blobstore filesystems
Date Fri, 03 Nov 2017 19:13:01 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238203#comment-16238203
] 

Jason Dere commented on HIVE-17963:
-----------------------------------

Looks like Utilities.mvFileToFinalPath() can get called when tmpPath does not actually exist,
which is causing the failures with the patch. Looks like the extra rename directory operation
should only be performed if tmpPath exists.

> Fix for HIVE-17113 can be improved for non-blobstore filesystems
> ----------------------------------------------------------------
>
>                 Key: HIVE-17963
>                 URL: https://issues.apache.org/jira/browse/HIVE-17963
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>            Priority: Major
>         Attachments: HIVE-17963.1.patch
>
>
> HIVE-17113/HIVE-17813 fix the duplicate file issue by performing file moves on a file-by-file
basis. For non-blobstore filesystems this results in many more filesystem/namenode operations
compared to the previous Utilities.mvFileToFinalPath() behavior (dedup files in src dir, rename
src dir to final dir).
> For non-blobstore filesystems, a better solution would be the one described [here|https://issues.apache.org/jira/browse/HIVE-17113?focusedCommentId=16100564&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16100564]:
> 1) Move the temp directory to a new directory name, to prevent additional files from
being added by any runaway processes.
> 2) Run removeTempOrDuplicateFiles() on this renamed temp directory
> 3) Run renameOrMoveFiles() to move the renamed temp directory to the final location.
> This results in only one additional file operation in non-blobstore FSes compared to
the original Utilities.mvFileToFinalPath() behavior.
> The proposal is to do away with the config setting hive.exec.move.files.from.source.dir
and always have behavior that should take care of the duplicate file issue described in HIVE-17113.
For non-blobstore filesystems we will do steps 1-3 described above. For blobstore filesystems
we will do the solution done in HIVE-17113/HIVE-17813 which does the file-by-file copy - this
should have the same number of file operations as doing a rename directory on blobstore, which
effectively results in file moves on a file-by-file basis.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message