spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Dependency hell in Spark applications
Date Fri, 05 Sep 2014 23:14:28 GMT
>From output of dependency:tree:

[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @
spark-streaming_2.10 ---
[INFO] org.apache.spark:spark-streaming_2.10:jar:1.1.0-SNAPSHOT
INFO] +- org.apache.spark:spark-core_2.10:jar:1.1.0-SNAPSHOT:compile
[INFO] |  +- org.apache.hadoop:hadoop-client:jar:2.4.0:compile
...
[INFO] |  +- net.java.dev.jets3t:jets3t:jar:0.9.0:compile
[INFO] |  |  +- commons-codec:commons-codec:jar:1.5:compile
[INFO] |  |  +- org.apache.httpcomponents:httpclient:jar:4.1.2:compile
[INFO] |  |  +- org.apache.httpcomponents:httpcore:jar:4.1.2:compile

bq. excluding httpclient from spark-streaming dependency in your sbt/maven
project

This should work.


On Fri, Sep 5, 2014 at 3:14 PM, Tathagata Das <tathagata.das1565@gmail.com>
wrote:

> If httpClient dependency is coming from Hive, you could build Spark without
> Hive. Alternatively, have you tried excluding httpclient from
> spark-streaming dependency in your sbt/maven project?
>
> TD
>
>
>
> On Thu, Sep 4, 2014 at 6:42 AM, Koert Kuipers <koert@tresata.com> wrote:
>
> > custom spark builds should not be the answer. at least not if spark ever
> > wants to have a vibrant community for spark apps.
> >
> > spark does support a user-classpath-first option, which would deal with
> > some of these issues, but I don't think it works.
> > On Sep 4, 2014 9:01 AM, "Felix Garcia Borrego" <fborrego@gilt.com>
> wrote:
> >
> > > Hi,
> > > I run into the same issue and apart from the ideas Aniket said, I only
> > > could find a nasty workaround. Add my custom
> > PoolingClientConnectionManager
> > > to my classpath.
> > >
> > >
> > >
> >
> http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955
> > >
> > >
> > >
> > > On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen <sowen@cloudera.com> wrote:
> > >
> > > > Dumb question -- are you using a Spark build that includes the
> Kinesis
> > > > dependency? that build would have resolved conflicts like this for
> > > > you. Your app would need to use the same version of the Kinesis
> client
> > > > SDK, ideally.
> > > >
> > > > All of these ideas are well-known, yes. In cases of super-common
> > > > dependencies like Guava, they are already shaded. This is a
> > > > less-common source of conflicts so I don't think http-client is
> > > > shaded, especially since it is not used directly by Spark. I think
> > > > this is a case of your app conflicting with a third-party dependency?
> > > >
> > > > I think OSGi is deemed too over the top for things like this.
> > > >
> > > > On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar
> > > > <aniket.bhatnagar@gmail.com> wrote:
> > > > > I am trying to use Kinesis as source to Spark Streaming and have
> run
> > > > into a
> > > > > dependency issue that can't be resolved without making my own
> custom
> > > > Spark
> > > > > build. The issue is that Spark is transitively dependent
> > > > > on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because
> of
> > > > > libfb303 coming from hbase and hive-serde) whereas AWS SDK is
> > dependent
> > > > > on org.apache.httpcomponents:httpclient:jar:4.2. When I package and
> > run
> > > > > Spark Streaming application, I get the following:
> > > > >
> > > > > Caused by: java.lang.NoSuchMethodError:
> > > > >
> > > >
> > >
> >
> org.apache.http.impl.conn.DefaultClientConnectionOperator.<init>(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
> > > > >         at
> > > > >
> > > >
> > >
> >
> org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
> > > > >         at
> > > > >
> > > >
> > >
> >
> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:114)
> > > > >         at
> > > > >
> > > >
> > >
> >
> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:99)
> > > > >         at
> > > > >
> > > >
> > >
> >
> com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
> > > > >         at
> > > > >
> > > >
> > >
> >
> com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
> > > > >         at
> > > > >
> com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:181)
> > > > >         at
> > > > >
> > > >
> > >
> >
> com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:119)
> > > > >         at
> > > > >
> > > >
> > >
> >
> com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:103)
> > > > >         at
> > > > >
> > > >
> > >
> >
> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:136)
> > > > >         at
> > > > >
> > > >
> > >
> >
> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:117)
> > > > >         at
> > > > >
> > > >
> > >
> >
> com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.<init>(AmazonKinesisAsyncClient.java:132)
> > > > >
> > > > > I can create a custom Spark build with
> > > > > org.apache.httpcomponents:httpclient:jar:4.2 included in the
> assembly
> > > > but I
> > > > > was wondering if this is something Spark devs have noticed and are
> > > > looking
> > > > > to resolve in near releases. Here are my thoughts on this issue:
> > > > >
> > > > > Containers that allow running custom user code have to often
> resolve
> > > > > dependency issues in case of conflicts between framework's and user
> > > > code's
> > > > > dependency. Here is how I have seen some frameworks resolve the
> > issue:
> > > > > 1. Provide a child-first class loader: Some JEE containers
> provided a
> > > > > child-first class loader that allowed for loading classes from user
> > > code
> > > > > first. I don't think this approach completely solves the problem
as
> > the
> > > > > framework is then susceptible to class mismatch errors.
> > > > > 2. Fold in all dependencies in a sub-package: This approach
> involves
> > > > > folding all dependencies in a project specific sub-package (like
> > > > > spark.dependencies). This approach is tedious because it involves
> > > > building
> > > > > custom version of all dependencies (and their transitive
> > dependencies)
> > > > > 3. Use something like OSGi: Some frameworks has successfully used
> > OSGi
> > > to
> > > > > manage dependencies between the modules. The challenge in this
> > approach
> > > > is
> > > > > to OSGify the framework and hide OSGi complexities from end user.
> > > > >
> > > > > My personal preference is OSGi (or atleast some support for OSGi)
> > but I
> > > > > would love to hear what Spark devs are thinking in terms of
> resolving
> > > the
> > > > > problem.
> > > > >
> > > > > Thanks,
> > > > > Aniket
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > > > For additional commands, e-mail: dev-help@spark.apache.org
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message