hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-1852) Reduce unnecessary DFSClient.rename() calls
Date Thu, 16 Dec 2010 02:13:01 GMT

     [ https://issues.apache.org/jira/browse/HIVE-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ning Zhang updated HIVE-1852:
-----------------------------

    Attachment: HIVE-1852.2.patch

@Joydeep, replaceFiles() is called by loadTable with the "replace" flag turned on, which mean
it should overwrite the destination directory. Also tmpPath is a temporary path that should
not exist before this call. Later in the function tmpPath is rename again to the destination
path (where the existing files in destf will be removed).  

The non-overwriting version is implemented in copyFiles().  So I think we don't need another
function. 

I also added a test case to test the load data (overwrite or not) works as expected to the
new patch (no code changes from the first one).

> Reduce unnecessary DFSClient.rename() calls
> -------------------------------------------
>
>                 Key: HIVE-1852
>                 URL: https://issues.apache.org/jira/browse/HIVE-1852
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1852.2.patch, HIVE-1852.patch
>
>
> In Hive client side (MoveTask etc), DFSCleint.rename() is called for every file inside
a directory. It is very expensive for a large directory in a busy DFS namenode. We should
replace it with a single rename() call on the whole directory. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message