hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1330) Unifying Hadoop Steaming/Hadoop Pipe
Date Thu, 28 Jun 2007 10:09:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508781

Devaraj Das commented on HADOOP-1330:

Some early thoughts on merging Streaming & Pipes (some of them are potential improvements
in Streaming)
1) The command-line for both can be unified since they share quite a few common arguments.
So we could have a base class that handles all the common arguments, and subclasses that handle
the respective specific arguments. Toolbase is one of the candidates that can help here.
2) Both Streaming and Pipes frameworks spawns Java Map/Reduce tasks that in turn spawns the
executables (like perl scripts or c++ executables). The main difference between the two approaches
is in the communication protocol between the Java map/reduce processes and the executables
- Streaming uses stdin/stdout streams and Pipes uses sockets. One thing to investigate here
is the feasibility of implement the Pipes protocols for the Streaming case.
3) The combiner in Pipes is more flexible in that it allows both native and with some tweaks
can use Java combiners as well. This is missing in Streaming where we are restricted to invoke
the user's combiner only through the Java framework.
4) Use of FileCache in Streaming

Have the above so far .. Would be great if others can add to this list. Planning to reuse
Pipes as much as possible for the Streaming framework. Also, pls let me know if there are
features in Streaming that we want to introduce in Pipes?

> Unifying Hadoop Steaming/Hadoop Pipe
> ------------------------------------
>                 Key: HADOOP-1330
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1330
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Runping Qi
>            Assignee: Owen O'Malley
> Hadoop Streaming and Pipe have many similarities. It is worthwhile to examine how to
factor out the commonality in the implementation and to unify the user interface as much as

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message