hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lohit vijayarenu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2120) dfs -getMerge does not do what it says it does
Date Wed, 31 Oct 2007 20:04:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539170

lohit vijayarenu commented on HADOOP-2120:

Visualizing this as a map-reduce job which actually merge/sort into a single file, shouldn't
it be available as  a separate package (like distcp, may be)?
 This feature of merging files would be very useful for users who would like to have only
one output file. For now they would want to stick to a single reducer and do not want to submit
a job with multiple reducers (even thought that is better machine utilization). A generic
merge utility with understands the format and merges would be useful? Something motivated
from https://issues.apache.org/jira/browse/HADOOP-2113

> dfs -getMerge does not do what it says it does
> ----------------------------------------------
>                 Key: HADOOP-2120
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2120
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>             Fix For: 0.16.0
> dfs -getMerge, which calls FileUtil.CopyMerge, contains this javadoc:
> {code}
> Get all the files in the directories that match the source file pattern
>    * and merge and sort them to only one file on local fs 
>    * srcf is kept.
> {code}
> However, it only concatenates the set of input files, rather than merging them in sorted
> Ideally, the copyMerge should be equivalent to a map-reduce job with IdentityMapper and
IdentityReducer with numReducers = 1. However, not having to run this as a map-reduce job
has some advantages, since it increases cluster utilization during reduce phase.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message