Mailing-List: contact user-help@spark.apache.org; run by ezmlm
Precedence: bulk
Received-SPF: pass (nike.apache.org: domain of cjnolet@gmail.com designates
 209.85.213.170 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAK=ti8Pg188GnnVo1_4vBAwjkJ=kHGv4SZNxTAzLu8-OeCT4UA@mail.gmail.com>
References: 
 <CAOHP_tHi0HxgvPTL3CWNe3iDNTesSMQ6dOi1b5r8DWn2mBaK+A@mail.gmail.com>
 <54D1A67A.2030609@yahoo.com>
 <CAK=ti8Pg188GnnVo1_4vBAwjkJ=kHGv4SZNxTAzLu8-OeCT4UA@mail.gmail.com>
From: Corey Nolet <cjnolet@gmail.com>
Date: Wed, 4 Feb 2015 07:47:19 -0500
Message-ID: 
 <CAOHP_tEPDUSvXjKtCarM_Z1cFkwxU1qEqaWOtu=5j-T_38+NAw@mail.gmail.com>
Subject: 
 =?UTF-8?B?UmU6IOKAnG1hcHJlZHVjZS5qb2IudXNlci5jbGFzc3BhdGguZmlyc3TigJ0gZm9yIFNwYQ==?=
	=?UTF-8?B?cms=?=
To: bo yang <bobyangbo@gmail.com>
Cc: medale@acm.org, user <user@spark.apache.org>
Content-Type: multipart/alternative; boundary=90e6ba613592c08e1f050e42973d

--90e6ba613592c08e1f050e42973d
Content-Type: text/plain; charset=UTF-8

Bo yang-

I am using Spark 1.2.0 and undoubtedly there are older Guava classes which
are being picked up and serialized with the closures when they are sent
from the driver to the executors because the class serial version ids don't
match from the driver to the executors. Have you tried doing this? Guava
works fine for me when this is not the case- but as soon as a Guava class
which was changed from versions <15.0 is serialized, it fails. See [1] fore
info- we did fairly extensive testing last night. I've isolated the issue
to Hadoop's really old version of Guava being picked up. Again, this is
only noticeable when classes are used from Guava 15.0 that were changed
from previous versions and those classes are being serialized on the driver
and shipped to the executors.


[1] https://github.com/calrissian/mango/issues/158

On Wed, Feb 4, 2015 at 1:31 AM, bo yang <bobyangbo@gmail.com> wrote:

> Corey,
>
> Which version of Spark do you use? I am using Spark 1.2.0, and  guava
> 15.0. It seems fine.
>
> Best,
> Bo
>
>
> On Tue, Feb 3, 2015 at 8:56 PM, M. Dale <medale94@yahoo.com.invalid>
> wrote:
>
>>  Try spark.yarn.user.classpath.first (see
>> https://issues.apache.org/jira/browse/SPARK-2996 - only works for YARN).
>> Also thread at
>> http://apache-spark-user-list.1001560.n3.nabble.com/netty-on-classpath-when-using-spark-submit-td18030.html
>> .
>>
>> HTH,
>> Markus
>>
>> On 02/03/2015 11:20 PM, Corey Nolet wrote:
>>
>> I'm having a really bad dependency conflict right now with Guava versions
>> between my Spark application in Yarn and (I believe) Hadoop's version.
>>
>>  The problem is, my driver has the version of Guava which my application
>> is expecting (15.0) while it appears the Spark executors that are working
>> on my RDDs have a much older version (assuming it's the old version on the
>> Hadoop classpath).
>>
>>  Is there a property like "mapreduce.job.user.classpath.first' that I
>> can set to make sure my own classpath is extablished first on the executors?
>>
>>
>>
>

--90e6ba613592c08e1f050e42973d
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Bo yang-=C2=A0<div><br></div><div>I am using Spark 1.2.0 a=
nd undoubtedly there are older Guava classes which are being picked up and =
serialized with the closures when they are sent from the driver to the exec=
utors because the class serial version ids don&#39;t match from the driver =
to the executors. Have you tried doing this? Guava works fine for me when t=
his is not the case- but as soon as a Guava class which was changed from ve=
rsions &lt;15.0 is serialized, it fails. See [1] fore info- we did fairly e=
xtensive testing last night. I&#39;ve isolated the issue to Hadoop&#39;s re=
ally old version of Guava being picked up. Again, this is only noticeable w=
hen classes are used from Guava 15.0 that were changed from previous versio=
ns and those classes are being serialized on the driver and shipped to the =
executors.</div><div><br></div><div><br></div><div>[1]=C2=A0<a href=3D"http=
s://github.com/calrissian/mango/issues/158">https://github.com/calrissian/m=
ango/issues/158</a></div></div><div class=3D"gmail_extra"><br><div class=3D=
"gmail_quote">On Wed, Feb 4, 2015 at 1:31 AM, bo yang <span dir=3D"ltr">&lt=
;<a href=3D"mailto:bobyangbo@gmail.com" target=3D"_blank">bobyangbo@gmail.c=
om</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"marg=
in:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"=
><span style=3D"font-size:12.8000001907349px">Corey,</span><br><div><span s=
tyle=3D"font-size:12.8000001907349px"><br></span></div><div><span style=3D"=
font-size:12.8000001907349px">Which version of Spark do you use? I am using=
 Spark 1.2.0, and =C2=A0</span>guava 15.0. It seems fine.</div><div><br></d=
iv><div>Best,</div><div>Bo</div><div><br></div>


</div><div class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><b=
r><div class=3D"gmail_quote">On Tue, Feb 3, 2015 at 8:56 PM, M. Dale <span =
dir=3D"ltr">&lt;<a href=3D"mailto:medale94@yahoo.com.invalid" target=3D"_bl=
ank">medale94@yahoo.com.invalid</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex">
 =20
   =20
 =20
  <div bgcolor=3D"#FFFFFF" text=3D"#000000">
    Try spark.yarn.user.classpath.first (see
    <a href=3D"https://issues.apache.org/jira/browse/SPARK-2996" target=3D"=
_blank">https://issues.apache.org/jira/browse/SPARK-2996</a> - only works f=
or
    YARN). Also thread at
<a href=3D"http://apache-spark-user-list.1001560.n3.nabble.com/netty-on-cla=
sspath-when-using-spark-submit-td18030.html" target=3D"_blank">http://apach=
e-spark-user-list.1001560.n3.nabble.com/netty-on-classpath-when-using-spark=
-submit-td18030.html</a>.<br>
    <br>
    HTH,<br>
    Markus<br>
    <br>
   =20
    <div>On 02/03/2015 11:20 PM, Corey Nolet
      wrote:<br>
    </div>
    <blockquote type=3D"cite">
      <div dir=3D"ltr">I&#39;m having a really bad dependency conflict righ=
t
        now with Guava versions between my Spark application in Yarn and
        (I believe) Hadoop&#39;s version.=C2=A0
        <div><br>
        </div>
        <div>The problem is, my driver has the version of Guava which my
          application is expecting (15.0) while it appears the Spark
          executors that are working on my RDDs have a much older
          version (assuming it&#39;s the old version on the Hadoop
          classpath).=C2=A0</div>
        <div><br>
        </div>
        <div>Is there a property like
          &quot;mapreduce.job.user.classpath.first&#39; that I can set to m=
ake
          sure my own classpath is extablished first on the executors?</div=
>
      </div>
    </blockquote>
    <br>
  </div>

</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--90e6ba613592c08e1f050e42973d--