hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-6387) FsShell -getmerge source file pattern is broken
Date Tue, 31 May 2011 15:34:49 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041629#comment-13041629
] 

Daryn Sharp commented on HADOOP-6387:
-------------------------------------

Using checkDest adds new logic that I don't think is warranted:  The dest is supposed to be
a file per the usage of copyMerge.  With this change, if the given dest is a directory then
the output is the dest + basename of the first source directory. I feel that's unexpected
and surprisingly behavior, so I'd like to see that reverted.

There's an easier way to get at the full arg list higher in the call stack: processArguments().
 Hooking in there and eliminate processPath(s) since those are intended to process each path
individually.  In processArguments, make a call to super (don't copy-n-paste like in the patch),
and then call FileUtil's copymerge.  Implement processPathArgument with a null body to short-out
the calls to the now removed processPath(s).

(Just an aside: I've been tempted to eliminate the call to FileUtils and just use processPath
to append to the dst file)

> FsShell -getmerge source file pattern is broken
> -----------------------------------------------
>
>                 Key: HADOOP-6387
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6387
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.23.0
>            Reporter: Eli Collins
>            Assignee: XieXianshan
>            Priority: Minor
>             Fix For: 0.23.0
>
>         Attachments: HADOOP-6387.patch
>
>
> The FsShell -getmerge command doesn't work if the "source file pattern" matches files.
See below. If the current behavior is intended then we should update the help documentation
and java docs to match, but it would be nice if the user could specify a set of files in a
directory rather than just directories.
> {code}
> $ hadoop fs -help getmerge
> -getmerge <src> <localdst>:  Get all the files in the directories that 
> 		match the source file pattern and merge and sort them to only
> 		one file on local fs. <src> is kept.
> $ hadoop fs -ls
> Found 3 items
> -rw-r--r--   1 eli supergroup          2 2009-11-23 17:39 /user/eli/1.txt
> -rw-r--r--   1 eli supergroup          2 2009-11-23 17:39 /user/eli/2.txt
> -rw-r--r--   1 eli supergroup          2 2009-11-23 17:39 /user/eli/3.txt
> $ hadoop fs -getmerge /user/eli/*.txt sorted.txt
> $ cat sorted.txt
> cat: sorted.txt: No such file or directory
> $ hadoop fs -getmerge /user/eli/* sorted.txt
> $ cat sorted.txt
> cat: sorted.txt: No such file or directory
> $ hadoop fs -getmerge /user/* sorted.txt
> $ cat sorted.txt 
> 1
> 2
> 3
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message