hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Developing cross-component patches post-split
Date Fri, 03 Jul 2009 15:56:49 GMT
Owen O'Malley wrote:
> On Wed, Jul 1, 2009 at 6:45 PM, Todd Lipcon<tlipcon@gmail.com> wrote:
>> Agree with Phillip here. Requiring a new jar to be checked in anywhere after
>> every common commit seems unscalable and nonperformant. For git users this
>> will make the repository size baloon like crazy (the jar is 400KB and we
>> have around 5300 commits so far = 2GB!).
> This is silly. Obviously, just like the source the jars compress
> across versions very well.
>> I think it would be reasonable to require that developers check out a
>> structure like:
>> working-dir/
>>  hadoop-common/
>>  hadoop-mapred/
>>  hadoop-hdfs/
> -1 They are separate subprojects. In the medium term, mapreduce and
> hdfs should compile and run against the released version common.
> Checking in the jars is a temporary step while the interfaces in
> common stabilize. Furthermore, I expect the volume in common should be
> much lower than in mapreduce or hdfs.

There are various use cases here

-people working in hdfs who don't need mapred (though they should for 
regression testing their work) but do need a stable common
-people working in mapred who need a working common/hdfs
-someone trying to work across all three (or in common, which is 
effectively that from a regression testing viewpoint)
-someone who just wants all the code for debugging/using mapreduce or 
other bits of hadoo

For anyone who is playing in at the source level where they are getting 
changing libraries, having the separate projects in subdirs with common 
targets is invaluable; ivy can do the glue. But at the same time, should 
you require everyone working on mapred to pull down and build common and 

View raw message