spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aniket Bhatnagar <aniket.bhatna...@gmail.com>
Subject Re: Dependency hell in Spark applications
Date Mon, 22 Sep 2014 11:15:08 GMT
I have submitted a defect in JIRA for this:
https://issues.apache.org/jira/browse/SPARK-3638 and have submitted a PR (
https://github.com/apache/spark/pull/2489) that temporarily fixes the
issue. Users would have to build spark with kinesis-asl to get the
compatible httpclient added to spark assembly jar.

On 22 September 2014 15:00, 이인규(inQ) <gofiri@gmail.com> wrote:

> Hello,
>
> In my case, I manually deleted org/apache/http directory in the
> spark-assembly jar file..
> I think if we use the latest version of httpclient (httpcore) library, we
> can resolve the problem.
> How about upgrading httpclient? (or jets3t?)
>
> 2014-09-11 19:09 GMT+09:00 Aniket Bhatnagar <aniket.bhatnagar@gmail.com>:
>
>> Thanks everyone for weighing in on this.
>>
>> I had backported kinesis module from master to spark 1.0.2 so just to
>> confirm if I am not missing anything, I did a dependency graph compare of
>> my spark build with spark-master
>> and org.apache.httpcomponents:httpclient:jar does seem to resolve to 4.1.2
>> dependency.
>>
>> I need Hive so, I can't really do a build without it. Even if I
>> exclude httpclient
>> dependency from my project's build, it will not solve the problem because
>> AWS SDK has been compiled with a greater version of http client. My spark
>> stream project does not uses http client directly. AWS SDK will look for
>>  class org.apache.http.impl.conn.DefaultClientConnectionOperator and it
>> will be loaded from spark-assembly jar regardless of how I package my
>> project (unless I am missing something?). I enabled verbosed classloading
>> to confirm that the class is indeed loading from spark-assembly jar.
>>
>> spark.files.userClassPathFirst option doesn't seem to be working on my
>> spark 1.0.2 build (not sure why).
>>
>> I was only left custom building spark and forcingly introduce latest
>> httpclient's latest version as dependency.
>>
>> Finally, I tested this on 1.1.0-RC4 today and it has the same issue. Has
>> anyone ever been able to get the Kinesis example work with spark-hadoop2.4
>> (with hive and yarn) build? I feel like this is a bug that exists even in
>> 1.1.0.
>>
>> I still believe we need a better solution to address the dependency hell
>> problem. If OSGi is deemed too over the top, what are the solutions being
>> investigated?
>>
>> On 6 September 2014 04:44, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>> > From output of dependency:tree:
>> >
>> > [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @
>> > spark-streaming_2.10 ---
>> > [INFO] org.apache.spark:spark-streaming_2.10:jar:1.1.0-SNAPSHOT
>> > INFO] +- org.apache.spark:spark-core_2.10:jar:1.1.0-SNAPSHOT:compile
>> > [INFO] |  +- org.apache.hadoop:hadoop-client:jar:2.4.0:compile
>> > ...
>> > [INFO] |  +- net.java.dev.jets3t:jets3t:jar:0.9.0:compile
>> > [INFO] |  |  +- commons-codec:commons-codec:jar:1.5:compile
>> > [INFO] |  |  +- org.apache.httpcomponents:httpclient:jar:4.1.2:compile
>> > [INFO] |  |  +- org.apache.httpcomponents:httpcore:jar:4.1.2:compile
>> >
>> > bq. excluding httpclient from spark-streaming dependency in your
>> > sbt/maven project
>> >
>> > This should work.
>> >
>> >
>> > On Fri, Sep 5, 2014 at 3:14 PM, Tathagata Das <
>> tathagata.das1565@gmail.com
>> > > wrote:
>> >
>> >> If httpClient dependency is coming from Hive, you could build Spark
>> >> without
>> >> Hive. Alternatively, have you tried excluding httpclient from
>> >> spark-streaming dependency in your sbt/maven project?
>> >>
>> >> TD
>> >>
>> >>
>> >>
>> >> On Thu, Sep 4, 2014 at 6:42 AM, Koert Kuipers <koert@tresata.com>
>> wrote:
>> >>
>> >> > custom spark builds should not be the answer. at least not if spark
>> ever
>> >> > wants to have a vibrant community for spark apps.
>> >> >
>> >> > spark does support a user-classpath-first option, which would deal
>> with
>> >> > some of these issues, but I don't think it works.
>> >> > On Sep 4, 2014 9:01 AM, "Felix Garcia Borrego" <fborrego@gilt.com>
>> >> wrote:
>> >> >
>> >> > > Hi,
>> >> > > I run into the same issue and apart from the ideas Aniket said,
I
>> only
>> >> > > could find a nasty workaround. Add my custom
>> >> > PoolingClientConnectionManager
>> >> > > to my classpath.
>> >> > >
>> >> > >
>> >> > >
>> >> >
>> >>
>> http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen <sowen@cloudera.com>
>> >> wrote:
>> >> > >
>> >> > > > Dumb question -- are you using a Spark build that includes
the
>> >> Kinesis
>> >> > > > dependency? that build would have resolved conflicts like
this
>> for
>> >> > > > you. Your app would need to use the same version of the Kinesis
>> >> client
>> >> > > > SDK, ideally.
>> >> > > >
>> >> > > > All of these ideas are well-known, yes. In cases of super-common
>> >> > > > dependencies like Guava, they are already shaded. This is
a
>> >> > > > less-common source of conflicts so I don't think http-client
is
>> >> > > > shaded, especially since it is not used directly by Spark.
I
>> think
>> >> > > > this is a case of your app conflicting with a third-party
>> >> dependency?
>> >> > > >
>> >> > > > I think OSGi is deemed too over the top for things like this.
>> >> > > >
>> >> > > > On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar
>> >> > > > <aniket.bhatnagar@gmail.com> wrote:
>> >> > > > > I am trying to use Kinesis as source to Spark Streaming
and
>> have
>> >> run
>> >> > > > into a
>> >> > > > > dependency issue that can't be resolved without making
my own
>> >> custom
>> >> > > > Spark
>> >> > > > > build. The issue is that Spark is transitively dependent
>> >> > > > > on org.apache.httpcomponents:httpclient:jar:4.1.2 (I
think
>> >> because of
>> >> > > > > libfb303 coming from hbase and hive-serde) whereas AWS
SDK is
>> >> > dependent
>> >> > > > > on org.apache.httpcomponents:httpclient:jar:4.2. When
I package
>> >> and
>> >> > run
>> >> > > > > Spark Streaming application, I get the following:
>> >> > > > >
>> >> > > > > Caused by: java.lang.NoSuchMethodError:
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> org.apache.http.impl.conn.DefaultClientConnectionOperator.<init>(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
>> >> > > > >         at
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
>> >> > > > >         at
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:114)
>> >> > > > >         at
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:99)
>> >> > > > >         at
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
>> >> > > > >         at
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
>> >> > > > >         at
>> >> > > > >
>> >> com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:181)
>> >> > > > >         at
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:119)
>> >> > > > >         at
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:103)
>> >> > > > >         at
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:136)
>> >> > > > >         at
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:117)
>> >> > > > >         at
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.<init>(AmazonKinesisAsyncClient.java:132)
>> >> > > > >
>> >> > > > > I can create a custom Spark build with
>> >> > > > > org.apache.httpcomponents:httpclient:jar:4.2 included
in the
>> >> assembly
>> >> > > > but I
>> >> > > > > was wondering if this is something Spark devs have noticed
and
>> are
>> >> > > > looking
>> >> > > > > to resolve in near releases. Here are my thoughts on
this
>> issue:
>> >> > > > >
>> >> > > > > Containers that allow running custom user code have
to often
>> >> resolve
>> >> > > > > dependency issues in case of conflicts between framework's
and
>> >> user
>> >> > > > code's
>> >> > > > > dependency. Here is how I have seen some frameworks
resolve the
>> >> > issue:
>> >> > > > > 1. Provide a child-first class loader: Some JEE containers
>> >> provided a
>> >> > > > > child-first class loader that allowed for loading classes
from
>> >> user
>> >> > > code
>> >> > > > > first. I don't think this approach completely solves
the
>> problem
>> >> as
>> >> > the
>> >> > > > > framework is then susceptible to class mismatch errors.
>> >> > > > > 2. Fold in all dependencies in a sub-package: This approach
>> >> involves
>> >> > > > > folding all dependencies in a project specific sub-package
>> (like
>> >> > > > > spark.dependencies). This approach is tedious because
it
>> involves
>> >> > > > building
>> >> > > > > custom version of all dependencies (and their transitive
>> >> > dependencies)
>> >> > > > > 3. Use something like OSGi: Some frameworks has successfully
>> used
>> >> > OSGi
>> >> > > to
>> >> > > > > manage dependencies between the modules. The challenge
in this
>> >> > approach
>> >> > > > is
>> >> > > > > to OSGify the framework and hide OSGi complexities from
end
>> user.
>> >> > > > >
>> >> > > > > My personal preference is OSGi (or atleast some support
for
>> OSGi)
>> >> > but I
>> >> > > > > would love to hear what Spark devs are thinking in terms
of
>> >> resolving
>> >> > > the
>> >> > > > > problem.
>> >> > > > >
>> >> > > > > Thanks,
>> >> > > > > Aniket
>> >> > > >
>> >> > > >
>> >> ---------------------------------------------------------------------
>> >> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> >> > > > For additional commands, e-mail: dev-help@spark.apache.org
>> >> > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message