hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-2120) dfs -getMerge does not do what it says it does
Date Tue, 20 Sep 2011 07:08:09 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108408#comment-13108408

Harsh J commented on HADOOP-2120:

I believe the sorting earlier referred to the file list sorting?

In that case, although FSNamesystem gives consistent sorting for HDFS's listStatus and such,
note that Java's File APIs do not provide the same consistency while using getmerge over any
LocalFileSystem. I've opened HADOOP-7659 for this, btw.

> dfs -getMerge does not do what it says it does
> ----------------------------------------------
>                 Key: HADOOP-2120
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2120
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: documentation, fs
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>              Labels: newbie
> dfs -getMerge, which calls FileUtil.CopyMerge, contains this javadoc:
> {code}
> Get all the files in the directories that match the source file pattern
>    * and merge and sort them to only one file on local fs 
>    * srcf is kept.
> {code}
> However, it only concatenates the set of input files, rather than merging them in sorted
> Ideally, the copyMerge should be equivalent to a map-reduce job with IdentityMapper and
IdentityReducer with numReducers = 1. However, not having to run this as a map-reduce job
has some advantages, since it increases cluster utilization during reduce phase.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message