hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dere (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17963) Fix for HIVE-17113 can be improved for non-blobstore filesystems
Date Thu, 02 Nov 2017 21:15:02 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236608#comment-16236608
] 

Jason Dere commented on HIVE-17963:
-----------------------------------

cc [~ashutoshc] [~owen.omalley]

> Fix for HIVE-17113 can be improved for non-blobstore filesystems
> ----------------------------------------------------------------
>
>                 Key: HIVE-17963
>                 URL: https://issues.apache.org/jira/browse/HIVE-17963
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>            Priority: Major
>         Attachments: HIVE-17963.1.patch
>
>
> HIVE-17113/HIVE-17813 fix the duplicate file issue by performing file moves on a file-by-file
basis. For non-blobstore filesystems this results in many more filesystem/namenode operations
compared to the previous Utilities.mvFileToFinalPath() behavior (dedup files in src dir, rename
src dir to final dir).
> For non-blobstore filesystems, a better solution would be the one described [here|https://issues.apache.org/jira/browse/HIVE-17113?focusedCommentId=16100564&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16100564]:
> 1) Move the temp directory to a new directory name, to prevent additional files from
being added by any runaway processes.
> 2) Run removeTempOrDuplicateFiles() on this renamed temp directory
> 3) Run renameOrMoveFiles() to move the renamed temp directory to the final location.
> This results in only one additional file operation in non-blobstore FSes compared to
the original Utilities.mvFileToFinalPath() behavior.
> The proposal is to do away with the config setting hive.exec.move.files.from.source.dir
and always have behavior that should take care of the duplicate file issue described in HIVE-17113.
For non-blobstore filesystems we will do steps 1-3 described above. For blobstore filesystems
we will do the solution done in HIVE-17113/HIVE-17813 which does the file-by-file copy - this
should have the same number of file operations as doing a rename directory on blobstore, which
effectively results in file moves on a file-by-file basis.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message