hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kay Kay <kaykay.uni...@gmail.com>
Subject Re: Moving to Maven (HBASE-2099)
Date Sat, 13 Feb 2010 11:02:36 GMT
Given that both ivy-maven-stuff address the issue of dependency 
management, I can broadly think of 2 reasons (the producer and the 
consumer) and other advantages coming under them one way or another.

* The maintainer of the project would be responsible for handling 
dependencies and upgrading them on a need basis.
    Just like any other encapsulation we talk about, the children 
dependencies (level 2 of the dependency tree) of the primary 
dependencies that we are concerned with, are entirely hidden from us 
since that would be the job of the corresponding package owner to 
identify and list the same (pom.xml / ivy.xml , as appropriate) , and 
not that of the user of the package.

   With appropriate test suites in place, it would be easy to flip / 
test-drive a new version before reverting back / upgrading , especially 
when a project publishes more than 1 artifact.

* As a consumer of the project,  if a maintainer puts the artifacts 
along with the dependencies, that makes it easy to assemble the blocks 
with only the primary dependencies listed ( and not worrying about other 
libraries, that might be needed , encountering nasty ClassNotFoundErrors 
in runtime).

I agree that both of them sound very theoretical and ideal . The case 
where it works straight-forward is when you are setting up a project - 
you can get up and running with minimal bullet points that you know as a 
dependency of your project , without trying to gather the entire list 
needed for the same, behind the scenes. Since with mvn / ivy , that is 
already defined by the package maintainer and best to use that.

This might not make sense for hadoop, but for a much larger codebase, as 
consumers of the same - it would be very useful to have these blocks 
'blessed' by the original contributors / maintainers - so they can keep 
up / test drive new versions before actually making a decision one way 
or another, as opposed to ( remove all old versions and download and add 
new versions in ./lib directory ).

Recently , we had an upgradation of the thrift library and during the 
process -there were some discussions about some of the guts of thrift 
code, using commons-lang for a hashcode implementation (or s.th , 
similar to that). While that was definitely informative, as users of the 
library - it is something that should have been transparent , if the 
process was already existing as opposed to getting into the guts of it.

On the other end of the spectrum - currently , if anybody is planning to 
get started with just the client framework of HBase to communicate with 
the eco-system ( zk , master, r.s etc. ) , they have to get every other 
dependency listed at hbase  (not every body has the time/ resources to 
play around with sources and figure out the subset and restrict it) .
Assuming the server setup is all complete and clients living in a 
different machine - all that is needed is a scaled down version of 
client library (and not the giant list), that knows the ipc semantics , 
without worrying about the server internals. So - on the publication 
side-  once ivy/maven-ization is complete , hbase can start publishing 
different artifacts that could be used , depending on the needs.

A great candidate for such a case would be mahout project, that uses 
hbase for one of its algorithm implementations, and there is no 
necessity for a giant load of hbase on them, say.

Specifically for hdfs (and hadoop projects, in general)  - I can 
definitely see the frustration coming from , due to the necessity to 
keep up with snapshots that is in a state of flux / transformation.  
That in turn slows down the build time. Ironically - that would be the 
right way to keep up with the dependencies and as consumers of other 
projects - we can be less aggressive about how frequently we want to 
keep up with upstream projects, while having an easy option to try out 
new ones as they become available.

As we see - it is just another eco-system, that can thrive well - when 
everybody plays by the rules. Bad / inconsistent pom-s, ivy-s definitely 
do exist , as I discovered (on some of the projects of hadoop-xyz). 
While that can be extremely frustrating - assuming the community is open 
to receive patches / address them - will help thrive the eco-system 
better.  Viewing the dependency graph of a given project with other 
stakeholders in the same room, (applicable within a 'shop' as opposed to 
'oss' projects ) , will bring forth 'ball-of-mud' / 'code duplication' 
design patterns by using some graph theory 101 principles and give an 
idea , when people are starting to refactor existing code.  ( Ideally it 
should be a dependency tree, but for all practical purposes - it becomes 
a dependency graph for a variety of reasons, some of them non-technical 
as anybody can guess ! ).

As Mathias had pointed out - while maven / ivy solve the dependency 
management problem, maven's rigid rules sometime make the build process 
go crazy (especially when we try to transfer from non-dependency 
management world with custom scripts / unconventional 'target' 
lifecycles ).  I am +1-on the maven part of HBase since I do not see the 
need for such a process , if we keep the scope of hbase-core restricted 
to what it does today, with other plugins to hbase, appearing as 
separate libraries / apps as opposed to overloading the codebase with 
more responsibilities.  The detailed comparison between maven and ivy 
warrant a separate post altogether, so am not going to continue on that.

I have listed them , from my own experiences and do not reflect the 
official stance behind 'ivy'-ization of hadoop / even hbase , for that 
matter.  In reality - for the longevity of code base, such 'blessed' 
eco-system helps everyone in it.

   K K.

On 02/13/2010 01:51 AM, Dhruba Borthakur wrote:
> My personal experience is that the ivy-maven-stuff introduced into the
> Hadoop build system has tremendously slowed down the Hadoop build process. I
> am sure that this disadvantage is offet-ed by some advantages that I am not
> aware of. If you could educate me on the top two advantages that accrued to
> Hadoop after moving to the new build process, that would be awesome.
> thanks a bunch,
> dhruba
> On Sat, Feb 13, 2010 at 1:44 AM, Kay Kay<kaykay.unique@gmail.com>  wrote:
>> On 02/13/2010 01:29 AM, Dhruba Borthakur wrote:
>>>    From what I understand, the slowness of 'ivy' can be reduced if you can
>>> fetch dependent jars from local ivy server, isn't it?
>> The problem discussed is an artifact of hbase , trying to keep up with the
>> most recent snapshots of hadoop-core/ hdfs / mapred and hence the ivy
>> resolution is expensive that every time it hits the mvn repository to check
>> for the latest snapshot , if any.  So the slowness is due to the necessity
>> to keep up with the dependencies to identify issues early in the cycle.
>> Specifically this can be attributed to the - changing="true" in all the
>> ivy.xml-s in hbase, for hadoop artifacts . I am looking to making it a
>> configurable option to avoid expensive build time.
>> This will not be an issue if this were a hbase release, depending on other
>> releases of hadoop-core / mapred / common etc.
>> Both ivy and maven does cache the artifacts locally making the roundtrip
>> redundant (except for the first time, of course), so this should not be an
>> issue for people trying to build the release from sources, since it should
>> be moot by then.
>>   thanks,
>>> dhruba
>>> On Sat, Feb 13, 2010 at 12:25 AM, Kay Kay<kaykay.unique@gmail.com>
>>>   wrote:
>>>> Mathias -
>>>>    I have been using Ivy / Maven , interchangeably in different projects
>>>> for
>>>> the build management.  Both of them clearly have their strong points and
>>>> drawbacks.  Ivy fits great for thrift because of the nature of tasks ,
>>>> involved using some external command-line (thrift generators) etc.  As I
>>>> mentioned before - HBase does not have such cross maven goals / between
>>>> the
>>>> hairs as the build lifecycle is pretty straight-forward.
>>>>   In any case - the intention is to get to publish HBase artifacts and
>>>> maintain a smaller core and encouraging contribs. from the artifacts as
>>>> opposed to getting into the codebase.
>>>> Once there are HBase artifacts published , the contrib / plugins for the
>>>> same would be free to use ivy (with m2compatible="true") / maven as
>>>> appropriate.
>>>> Ryan -
>>>>    The slowness is attributed to the 'changing="true" ' in ivy.xml-s for
>>>> all
>>>> the hadoop-common / -hdfs / -mapreduce snapshots that we are using. I am
>>>> facing similar 'slowness' with other mvn hadoop (snapshot) dependencies
>>>> as
>>>> well. In retrospective, that should have been made a configurable flag in
>>>> libraries.properties , to ease things. Hopefully that is sorted out soon.
>>>> On 02/13/2010 12:10 AM, Ryan Rawson wrote:
>>>>> Would you mind elaborating more?  At the moment, most people do not
>>>>> build hbase, and the POM/jar/publishing thing is orthogonal - those
>>>>> who wish to build their own projects with ivy and/or ant are free to
>>>>> do so and not be impacted by our use of maven.
>>>>> We have ivy, but it doesnt integrate with our IDEs and is rather slow
>>>>> to build and rebuild.
>>>>> On Sat, Feb 13, 2010 at 12:03 AM, Mathias Herberts
>>>>> <mathias.herberts@gmail.com>    wrote:
>>>>>> -1
>>>>>> I think Maven is too complex and will lower the adoption of HBase
>>>>>> people today willing to build it.
>>>>>> I would suggest using Ivy for dependency management as was done in
>>>>>> Thrift.
>>>>>> Mathias.

View raw message