Mailing-List: contact dev-help@spark.apache.org; run by ezmlm
Precedence: bulk
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
Subject: Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?
From: Doug Balog <doug.sparkdev@dugos.com>
In-Reply-To: 
 <CAPYnQ0VgssOj96EFdMUR_weUTHGF-uJTHk0nKtpuPxT6R4GhFA@mail.gmail.com>
Date: Thu, 22 Oct 2015 03:05:07 -0400
Cc: "dev@spark.apache.org" <dev@spark.apache.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <181CF4DB-0174-4701-9AC7-32813FA43C2F@dugos.com>
References: 
 <CAPYnQ0UG9ksSVUb_9CRjM_s=HN3SLOKvn7DtvGUzFT+6ZAp3tg@mail.gmail.com>
 <B18AEF13-69E4-43AB-92F2-C2A481357BBE@balog.net>
 <CAPYnQ0VgssOj96EFdMUR_weUTHGF-uJTHk0nKtpuPxT6R4GhFA@mail.gmail.com>
To: Chester Chen <chester@alpinenow.com>


> On Oct 21, 2015, at 8:45 PM, Chester Chen <chester@alpinenow.com> =
wrote:
>=20
> Doug
>   thanks for responding.=20
>  >>I think Spark just needs to be compiled against 1.2.1
>=20
>    Can you elaborate on this, or specific command you are referring ?=20=

>=20
>    In our build.scala, I was including the following
>=20
> "org.spark-project.hive" % "hive-exec" % "1.2.1.spark" intransitive()
>=20
>    I am not sure how the Spark compilation is directly related to =
this, please explain.  =20

I was referring to this comment
=
https://issues.apache.org/jira/browse/SPARK-6906?focusedCommentId=3D147123=
36&page=3Dcom.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=
#comment-14712336
And the updated documentation,
=
http://spark.apache.org/docs/latest/sql-programming-guide.html#interacting=
-with-different-versions-of-hive-metastore

Perhaps I misunderstood your question and why you are trying to compile =
against a different version of Hive.

>=20
>    When we submit the spark job, the we call Spark Yarn Client.scala =
directly ( not using spark-submit).=20
>    The client side is not depending on spark-assembly jar ( which is =
in the hadoop cluster).  The job submission actually failed in the =
client side.=20
>=20
>    Currently we get around this by replace the spark's hive-exec with =
apache hive-exec.=20
>=20

Why are you using the Spark Yarn Client.scala directly and not using the =
SparkLauncher that was introduced in 1.4.0 ?


Doug

>=20
>=20
> On Wed, Oct 21, 2015 at 5:27 PM, Doug Balog <doug@balog.net> wrote:
> See comments below.
>=20
> > On Oct 21, 2015, at 5:33 PM, Chester Chen <chester@alpinenow.com> =
wrote:
> >
> > All,
> >
> >     just to see if this happens to other as well.
> >
> >   This is tested against the
> >
> >    spark 1.5.1 ( branch 1.5  with label 1.5.2-SNAPSHOT with commit =
on Tue Oct 6, 84f510c4fa06e43bd35e2dc8e1008d0590cbe266)
> >
> >    Spark deployment mode : Spark-Cluster
> >
> >    Notice that if we enable Kerberos mode, the spark yarn client =
fails with the following:
> >
> > Could not initialize class org.apache.hadoop.hive.ql.metadata.Hive
> > java.lang.NoClassDefFoundError: Could not initialize class =
org.apache.hadoop.hive.ql.metadata.Hive
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native =
Method)
> >         at =
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:=
57)
> >         at =
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm=
pl.java:43)
> >         at java.lang.reflect.Method.invoke(Method.java:606)
> >         at =
org.apache.spark.deploy.yarn.Client$.org$apache$spark$deploy$yarn$Client$$=
obtainTokenForHiveMetastore(Client.scala:1252)
> >         at =
org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:271=
)
> >         at =
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.sc=
ala:629)
> >         at =
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:119)
> >         at org.apache.spark.deploy.yarn.Client.run(Client.scala:907)
> >
> >
> > Diving in Yarn Client.scala code and tested against different =
dependencies and notice the followings:  if  the kerberos mode is =
enabled, Client.obtainTokenForHiveMetastore() will try to use scala =
reflection to get Hive and HiveConf and method on these method.
> >
> >       val hiveClass =3D =
mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive")
> >       val hive =3D hiveClass.getMethod("get").invoke(null)
> >
> >       val hiveConf =3D hiveClass.getMethod("getConf").invoke(hive)
> >       val hiveConfClass =3D =
mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf")
> >
> >       val hiveConfGet =3D (param: String) =3D> Option(hiveConfClass
> >         .getMethod("get", classOf[java.lang.String])
> >         .invoke(hiveConf, param))
> >
> >    If the "org.spark-project.hive" % "hive-exec" % "1.2.1.spark" is =
used, then you will get above exception. But if we use the
> >        "org.apache.hive" % "hive-exec" "0.13.1-cdh5.2.0"
> >  The above method will not throw exception.
> >
> >   Here some questions and comments
> > 0) is this a bug ?
>=20
> I=E2=80=99m not an expert on this, but I think this might not be a =
bug.
> The Hive integration was redone for 1.5.0, see =
https://issues.apache.org/jira/browse/SPARK-6906
> and I think Spark just needs to be compiled against 1.2.1
>=20
>=20
> >
> > 1) Why spark-hive hive-exec behave differently ? I understand =
spark-hive hive-exec has less dependencies
> >    but I would expect it functionally the same
>=20
> I don=E2=80=99t know.
>=20
> > 2) Where I can find the source code for spark-hive hive-exec ?
>=20
> I don=E2=80=99t know.
>=20
> >
> > 3) regarding the method obtainTokenForHiveMetastore(),
> >    I would assume that the method will first check if the =
hive-metastore uri is present before
> >    trying to get the hive metastore tokens, it seems to invoke the =
reflection regardless the hive service in the cluster is enabled or not.
>=20
> Checking to see if the hive-megastore.uri is present before trying to =
get a delegation token would be an improvement.
> Also checking to see if we are running in cluster mode would be good, =
too.
> I will file a JIRA and make these improvements.
>=20
> > 4) Noticed the obtainTokenForHBase() in the same Class (Client.java) =
catches
> >    case e: java.lang.NoClassDefFoundError =3D> logDebug("HBase Class =
not found: " + e)
> >    and just ignore the exception ( log debug),
> >    but obtainTokenForHiveMetastore() does not catch =
NoClassDefFoundError exception, I guess this is the problem.
> > private def obtainTokenForHiveMetastore(conf: Configuration, =
credentials: Credentials) {
> >     // rest of code
> >  } catch {
> >     case e: java.lang.NoSuchMethodException =3D> { logInfo("Hive =
Method not found " + e); return }
> >     case e: java.lang.ClassNotFoundException =3D> { logInfo("Hive =
Class not found " + e); return }
> >     case e: Exception =3D> { logError("Unexpected Exception " + e)
> >       throw new RuntimeException("Unexpected exception", e)
> >     }
> >   }
> > }
>=20
> I tested the code against different scenarios, it possible that I =
missed the case where the class was not found.
> obtainTokenForHBase() was implemented after obtainTokenForHive().
>=20
> Cheers,
>=20
> Doug
>=20
>=20


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org