hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <>
Subject [jira] Commented: (HIVE-1852) Reduce unnecessary DFSClient.rename() calls
Date Mon, 20 Dec 2010 20:47:02 GMT


Joydeep Sen Sarma commented on HIVE-1852:

cool - the fsshell removal sounds good unless Yongqiang says something otherwise.

i am pretty sure this patch breaks load command with a wildcard though. it seems to me that
the load command is simply passing the input path (with the wildcard pattern) to the the loadTable/loadPartition
methods (via LoadTableDesc). these commands were previously capable of handling wildcards
that matched a set of files. now they will not be able to do that. Ning - can u confirm this?
(maybe add a test trying to load a wildcard pattern?)

on a more minor note - the checkPaths call that got taken out was checking for the presence
of nested subdirectories inside the path being loaded. is this no longer necessary? (do we
support directories within partitions/tables automatically at query time?)

> Reduce unnecessary DFSClient.rename() calls
> -------------------------------------------
>                 Key: HIVE-1852
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1852.2.patch, HIVE-1852.3.patch, HIVE-1852.patch
> In Hive client side (MoveTask etc), DFSCleint.rename() is called for every file inside
a directory. It is very expensive for a large directory in a busy DFS namenode. We should
replace it with a single rename() call on the whole directory. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message