hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Dahiya (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-619) Unify Map-Reduce and Streaming to take the same globbed input specification
Date Wed, 20 Dec 2006 21:21:24 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-619?page=comments#action_12460057 ] 
            
Sanjay Dahiya commented on HADOOP-619:
--------------------------------------

Why can't validateInput() simply call globPaths() and check that the results exist? The current
implementation is not only much more complicated, but I'm not sure that it's correct, since
it fails if any glob pattern fails to have matches. Is that what we want? I would think that
non-matching glob expressions, like empty directories, should be ignored so long as some of
the inputs exist. 

Doug: 
- From an earlier discussion with Owen http://issues.apache.org/jira/browse/HADOOP-619#action_12458633
we decided that non matching patterns should throw errors where as empty directories are acceptable.


I'll see whats happening to indentation.

> Unify Map-Reduce and Streaming to take the same globbed input specification
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-619
>                 URL: http://issues.apache.org/jira/browse/HADOOP-619
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.9.1
>            Reporter: eric baldeschwieler
>         Assigned To: Sanjay Dahiya
>             Fix For: 0.10.0
>
>         Attachments: Hadoop-619.patch, Hadoop-619.patch, Hadoop-619.patch, Hadoop-619_1.patch,
Hadoop-619_1.patch, Hadoop-619_2.patch, Hadoop-619_2.patch
>
>
> Right now streaming input is specified very differently from other map-reduce input.
 It would be good if these two apps could take much more similar input specs.
> In particular -input in streaming expects a file or glob pattern while MR takes a directory.
 It would be cool if both could take a glob patern of files and if both took a directory by
default (with some patern excluded to allow logs, metadata and other framework output to be
safely stored).
> We want to be sure that MR input is backward compatible over this change.  I propose
that a single file should be accepted as an input or a single directory.  Globs should only
match directories if the paterns is '/' terminated, to avoid massive inputs specified by mistake.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message