hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahadev Konar <maha...@hortonworks.com>
Subject Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
Date Wed, 07 Sep 2011 18:27:00 GMT
I like the idea of having tools as a seperate module and I dont think
that it will be a dumping ground unless we choose to make one of it.

+1 for hadoop tools module under trunk.

thanks
mahadev

On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <tucu@cloudera.com> wrote:
> Agreed, we should not have a dumping ground. IMO, what it would go into
> hadoop-tools (i.e. distcp, streaming and someone could argue for FsShell as
> well) are effectively hadoop CLI utilities. Having them in a separate module
> rather in than in the core module (common, hdfs, mapreduce) does not mean
> that they are secondary things, just modularization. Also it will help to
> get those tools to use public interfaces of the core module, and when we
> finally have a clean hadoop-client layer, those tools should only depend on
> that.
>
> Finally, the fact that tools would end up under trunk/hadoop-tools, it does
> not prevent that the packaging from HDFS and MAPREDUCE to bundle the
> same/different tools
>
> +1 for hadoop-tools/ (not binding)
>
> Thanks.
>
>
> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <eric818@gmail.com> wrote:
>
>> Mapreduce and HDFS are distinct function of Hadoop.  They are loosely
>> coupled.  If we have tools aggregator module, it will not have as
>> clear distinct function as other Hadoop modules.  Hence, it is
>> possible for a tool to be depend on both HDFS and map reduce.  If
>> something broke in tools module, it is unclear which subproject's
>> responsibility to maintain tools function.  Therefore, it is safer to
>> send tools to incubator or apache extra rather than deposit the
>> utility tools in tools subcategory.  There are many short lived
>> projects that attempts to associate themselves with Hadoop but not
>> being maintained.  It would be better to spin off those utility
>> projects than use Hadoop as a dumping ground.
>>
>> The previous discussion for removing contrib, most people were in
>> favor of doing so, and only a few contrib owners were reluctant to
>> remove contrib.  Fewer people has participated in restore
>> functionality of broken contrib projects.  History speaks for itself.
>> -1 (non-binding) for hadoop-tools.
>>
>> regards,
>> Eric
>>
>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <tucu@cloudera.com>
>> wrote:
>> > Eric,
>> >
>> > Personally I'm fine either way.
>> >
>> > Still, I fail to see why a generic/categorized tools increase/reduce the
>> > risk of dead code and how they make more-difficult/easier the
>> > package&deployment.
>> >
>> > Would you please explain this?
>> >
>> > Thanks.
>> >
>> > Alejandro
>> >
>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <eric818@gmail.com> wrote:
>> >
>> >> Option #2 proposed by Amareshwari, seems like a better proposal.  We
>> don't
>> >> want to repeat history for contrib again with hadoop-tools.  Having a
>> >> generic module like hadoop-tools increases the risk of accumulate dead
>> code.
>> >>  It would be better to categorize the hdfs or mapreduce specific tools
>> in
>> >> their respected subcategories.  It is also easier to manage from
>> >> package/deployment prospective.
>> >>
>> >> regards,
>> >> Eric
>> >>
>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
>> >>
>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw@apache.org>
>> wrote:
>> >> >>
>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
>> >> >>> We still need to answer Amareshwari's question (2) she asked
some
>> time
>> >> back
>> >> >>> about the automated code compilation and test execution of
the tools
>> >> module.
>> >> >>
>> >> >>
>> >> >>
>> >> >>>>> My #1 question is if tools is basically contrib reborn.
 If not,
>> what
>> >> >>>> makes
>> >> >>>>> it different?
>> >> >>
>> >> >>
>> >> >>        I'm still waiting for this answer as well.
>> >> >>
>> >> >>        Until such, I would be pretty much against a tools module.
>> >>  Changing the name of the dumping ground doesn't make it any less of a
>> >> dumping ground.
>> >> >
>> >> > IMO if the tools module only gets stuff like distcp that's maintained
>> >> > then it's not contrib, if it contains all the stuff from the current
>> >> > MR contrib then tools is just a re-labeling of contrib. Given that
>> >> > this proposal only covers moving distcp to tools it doesn't sound like
>> >> > contrib to me.
>> >> >
>> >> > Thanks,
>> >> > Eli
>> >>
>> >>
>> >
>>
>

Mime
View raw message