hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1768) Streaming command should be able to produced multiple outputs stored as separate DFS data sets
Date Fri, 07 May 2010 05:15:49 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Amareshwari Sriramadasu updated MAPREDUCE-1768:
-----------------------------------------------

    Component/s: contrib/streaming

> Streaming command should be able to produced multiple outputs stored as separate DFS
data sets
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1768
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1768
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: arkady borkovsky
>
> Some streaming commands in map or reduce phase, as a "side effect", produce several output
files.
> The names of output files may be hard coded, or specified on the command line.
> Streaming infrastructure should allow to get these files copied into DFS. 
> For each distinct "output file name", a separate DFS dataset (DFS directory) should be
created, and a file of an individual task should be stored there as a part file.   The names
of directories may be derived from the main "output name" (default)
> Related to https://issues.apache.org/jira/browse/HADOOP-2236:  in case of reduce, a single
name output file may be seen as a special case if this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message