hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Milind.Bhandar...@emc.com>
Subject Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
Date Wed, 07 Sep 2011 18:32:23 GMT
+1 for separate hadoop-tools module. However, if a tool is broken at
release time, and no one comes forward to fix it, it should be removed.
(i.e. Unlike contrib modules, where build and test failures were
tolerated.)

- milind

On 9/7/11 11:27 AM, "Mahadev Konar" <mahadev@hortonworks.com> wrote:

>I like the idea of having tools as a seperate module and I dont think
>that it will be a dumping ground unless we choose to make one of it.
>
>+1 for hadoop tools module under trunk.
>
>thanks
>mahadev
>
>On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <tucu@cloudera.com>
>wrote:
>> Agreed, we should not have a dumping ground. IMO, what it would go into
>> hadoop-tools (i.e. distcp, streaming and someone could argue for
>>FsShell as
>> well) are effectively hadoop CLI utilities. Having them in a separate
>>module
>> rather in than in the core module (common, hdfs, mapreduce) does not
>>mean
>> that they are secondary things, just modularization. Also it will help
>>to
>> get those tools to use public interfaces of the core module, and when we
>> finally have a clean hadoop-client layer, those tools should only
>>depend on
>> that.
>>
>> Finally, the fact that tools would end up under trunk/hadoop-tools, it
>>does
>> not prevent that the packaging from HDFS and MAPREDUCE to bundle the
>> same/different tools
>>
>> +1 for hadoop-tools/ (not binding)
>>
>> Thanks.
>>
>>
>> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <eric818@gmail.com> wrote:
>>
>>> Mapreduce and HDFS are distinct function of Hadoop.  They are loosely
>>> coupled.  If we have tools aggregator module, it will not have as
>>> clear distinct function as other Hadoop modules.  Hence, it is
>>> possible for a tool to be depend on both HDFS and map reduce.  If
>>> something broke in tools module, it is unclear which subproject's
>>> responsibility to maintain tools function.  Therefore, it is safer to
>>> send tools to incubator or apache extra rather than deposit the
>>> utility tools in tools subcategory.  There are many short lived
>>> projects that attempts to associate themselves with Hadoop but not
>>> being maintained.  It would be better to spin off those utility
>>> projects than use Hadoop as a dumping ground.
>>>
>>> The previous discussion for removing contrib, most people were in
>>> favor of doing so, and only a few contrib owners were reluctant to
>>> remove contrib.  Fewer people has participated in restore
>>> functionality of broken contrib projects.  History speaks for itself.
>>> -1 (non-binding) for hadoop-tools.
>>>
>>> regards,
>>> Eric
>>>
>>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <tucu@cloudera.com>
>>> wrote:
>>> > Eric,
>>> >
>>> > Personally I'm fine either way.
>>> >
>>> > Still, I fail to see why a generic/categorized tools increase/reduce
>>>the
>>> > risk of dead code and how they make more-difficult/easier the
>>> > package&deployment.
>>> >
>>> > Would you please explain this?
>>> >
>>> > Thanks.
>>> >
>>> > Alejandro
>>> >
>>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <eric818@gmail.com> wrote:
>>> >
>>> >> Option #2 proposed by Amareshwari, seems like a better proposal.  We
>>> don't
>>> >> want to repeat history for contrib again with hadoop-tools.  Having
>>>a
>>> >> generic module like hadoop-tools increases the risk of accumulate
>>>dead
>>> code.
>>> >>  It would be better to categorize the hdfs or mapreduce specific
>>>tools
>>> in
>>> >> their respected subcategories.  It is also easier to manage from
>>> >> package/deployment prospective.
>>> >>
>>> >> regards,
>>> >> Eric
>>> >>
>>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
>>> >>
>>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw@apache.org>
>>> wrote:
>>> >> >>
>>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
>>> >> >>> We still need to answer Amareshwari's question (2) she
asked
>>>some
>>> time
>>> >> back
>>> >> >>> about the automated code compilation and test execution
of the
>>>tools
>>> >> module.
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>>>> My #1 question is if tools is basically contrib
reborn.  If
>>>not,
>>> what
>>> >> >>>> makes
>>> >> >>>>> it different?
>>> >> >>
>>> >> >>
>>> >> >>        I'm still waiting for this answer as well.
>>> >> >>
>>> >> >>        Until such, I would be pretty much against a tools module.
>>> >>  Changing the name of the dumping ground doesn't make it any less
>>>of a
>>> >> dumping ground.
>>> >> >
>>> >> > IMO if the tools module only gets stuff like distcp that's
>>>maintained
>>> >> > then it's not contrib, if it contains all the stuff from the
>>>current
>>> >> > MR contrib then tools is just a re-labeling of contrib. Given that
>>> >> > this proposal only covers moving distcp to tools it doesn't sound
>>>like
>>> >> > contrib to me.
>>> >> >
>>> >> > Thanks,
>>> >> > Eli
>>> >>
>>> >>
>>> >
>>>
>>
>


Mime
View raw message