avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
Date Wed, 01 Sep 2010 18:13:53 GMT

    [ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905124#action_12905124

Scott Carey commented on AVRO-647:

bq. Finally, to be clear, is there a motive for this beyond better expressing dependencies?
Functionally sticking everything in a single jar with lots of optional dependencies works
fine, but folks then have to guess which dependencies they actually need, and that's the primary
problem this seeks to solve. Is that right, or are there other problems too?

That is the main case here.  Dependendies become more explicit.  Users should be able to consume
the parts they need without too much accidental baggage.  Instead, we could simply document
this all clearly so that users are armed with the information necessary to configure their
builds to exclude transitive dependencies they don't use.

However, Avro is by nature something that many things will depend on, and many of those things
portions of Avro might itself depend on.  In particular, making it easy to avoid circular
dependencies is a plus.  As we have seen (https://issues.apache.org/jira/browse/AVRO-545)
, even if it is possible to use ivy/maven features to prevent circular dependency, it makes
users uneasy.

The guidelines I use for my projects is two-fold:
* If the cascaded set of dependencies is large and likely to conflict with other things, it
should be easy to separate (for Avro, this is the hadoop dependency).
* If the dependency is physically large (large jar file), consider making it easy to separate.
* If the dependency is for a minor rarely used feature, be careful.  For example Jackson 1.0.1
being used by hadoop 0.20+ for dumping configuration files to JSON causes problems.

So for the case of Reflect, if paranamer doesn't have a lot of cascaded dependencies itself,
nor is a large jar on its own, then including it in avro-data is not going to be a big deal.

bq. If we separate jars, it might be good to split the build-time classpath in the same manner,
by splitting the src tree. 

We have three choices, I think:
1.  Leave the source tree as-is, and have the build use ant file excludes/includes to define
what is packaged in each one.   Managing the excludes/includes will be troublesome and would
be easier if the split was cleanly done by package.  Not much else would have to change --
the compile and test phases would stay the same.  There would also be the downside that tests
would not implicitly test the packaging boundaries.
2.  Break it into different source trees and continue using ant/ivy.  This is more work and
means we would be breaking up tests and compile phases too.
3.  Break it into different source trees and use maven.  Maven is a natural fit for this sort
of thing and I'm experienced with it, but it is not trivial and others here aren't as familiar
with it.  To wire up IDL and the Specific compiler,  Maven plugins would be required.  Interop
testing would probably still require ant. 

> Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
> --------------------------------------------------------------
>                 Key: AVRO-647
>                 URL: https://issues.apache.org/jira/browse/AVRO-647
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
> Our dependencies are starting to get a little complicated on the Java side.
> I propose we build two (possibly more) jars related to our major dependencies and functions.
> 1. avro.jar  (or perhaps avro-core.jar)
> This contains all of the core avro functionality for _using_ avro as a library.  This
excludes the specific compiler, avro idl, and other build-time or development tools, as well
as avro packages for third party integration such as hadoop.  This jar should then have a
minimal set of dependencies (jackson, jetty, SLF4J ?).
> 2. avro-dev.jar
> This would contain compilers, idl, development tools, etc.  Most applications will not
need this, but build systems and developers will.
> 3. avro-hadoop.jar
> This would contain the hadoop API and possibly pig/hive/whatever related to that.  This
makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message