hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12547) Deprecate hadoop-pipes
Date Thu, 05 Nov 2015 03:59:27 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991078#comment-14991078

Allen Wittenauer commented on HADOOP-12547:

bq.  I'm curious what the remaining cases are where hadoop-pipes is a better option than streaming.

Why do people use streaming instead of the Java MR API? Or even, why do people use Java MR
instead of streaming?

Meanwhile, it looks like large chunks of the pipes documentation got dropped somewhere between
1.x and 0.23.  It's definitely documented in 1.x:  

1.2.1: https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapred/pipes/package-summary.html

0.23 through 2.7.1:  https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapred/pipes/package-summary.html

Now I'm even less inclined to remove it: 

* we have people actually using it
* we have documentation that we can re-instate
* it actually does compile and has compiled for a very long time (albeit not in a very convenient
way... see the other JIRA to fix that though)
* we haven't removed or deprecated MRv1 yet either, and these two seem fairly tied together
given the history of why it exists
* missing unit tests... while a concern... well, if we remove everything that didn't have
unit tests, we'd be dropping large portions of the source base, including pretty much all
of the compiled C/C++ code

So yeah, I'm definitely -1 at this point. 

> Deprecate hadoop-pipes
> ----------------------
>                 Key: HADOOP-12547
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12547
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
> Development appears to have stopped on hadoop-pipes upstream for the last few years,
aside from very basic maintenance.  Hadoop streaming seems to be a better alternative, since
it supports more programming languages and is better implemented.
> There were no responses to a message on the mailing list asking for users of Hadoop pipes...
and in my experience, I have never seen anyone use this.  We should remove it to reduce our
maintenance burden and build times.

This message was sent by Atlassian JIRA

View raw message