Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 502FC18059 for ; Thu, 22 Oct 2015 07:05:28 +0000 (UTC) Received: (qmail 769 invoked by uid 500); 22 Oct 2015 07:05:25 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 666 invoked by uid 500); 22 Oct 2015 07:05:25 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 655 invoked by uid 99); 22 Oct 2015 07:05:25 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Oct 2015 07:05:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 0F6D5C0F8A for ; Thu, 22 Oct 2015 07:05:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.009 X-Spam-Level: X-Spam-Status: No, score=-0.009 tagged_above=-999 required=6.31 tests=[T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id qOBdxBUp8pk0 for ; Thu, 22 Oct 2015 07:05:10 +0000 (UTC) Received: from dugos.com (wmd.dugos.com [96.236.214.3]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with SMTP id 9F583439FF for ; Thu, 22 Oct 2015 07:05:10 +0000 (UTC) Received: (qmail 31094 invoked from network); 22 Oct 2015 07:05:04 -0000 Received: from dhcp065.dugos.com (10.7.42.65) by dugos.com with SMTP; 22 Oct 2015 07:05:04 -0000 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ? From: Doug Balog In-Reply-To: Date: Thu, 22 Oct 2015 03:05:07 -0400 Cc: "dev@spark.apache.org" Content-Transfer-Encoding: quoted-printable Message-Id: <181CF4DB-0174-4701-9AC7-32813FA43C2F@dugos.com> References: To: Chester Chen X-Mailer: Apple Mail (2.2104) > On Oct 21, 2015, at 8:45 PM, Chester Chen = wrote: >=20 > Doug > thanks for responding.=20 > >>I think Spark just needs to be compiled against 1.2.1 >=20 > Can you elaborate on this, or specific command you are referring ?=20= >=20 > In our build.scala, I was including the following >=20 > "org.spark-project.hive" % "hive-exec" % "1.2.1.spark" intransitive() >=20 > I am not sure how the Spark compilation is directly related to = this, please explain. =20 I was referring to this comment = https://issues.apache.org/jira/browse/SPARK-6906?focusedCommentId=3D147123= 36&page=3Dcom.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel= #comment-14712336 And the updated documentation, = http://spark.apache.org/docs/latest/sql-programming-guide.html#interacting= -with-different-versions-of-hive-metastore Perhaps I misunderstood your question and why you are trying to compile = against a different version of Hive. >=20 > When we submit the spark job, the we call Spark Yarn Client.scala = directly ( not using spark-submit).=20 > The client side is not depending on spark-assembly jar ( which is = in the hadoop cluster). The job submission actually failed in the = client side.=20 >=20 > Currently we get around this by replace the spark's hive-exec with = apache hive-exec.=20 >=20 Why are you using the Spark Yarn Client.scala directly and not using the = SparkLauncher that was introduced in 1.4.0 ? Doug >=20 >=20 > On Wed, Oct 21, 2015 at 5:27 PM, Doug Balog wrote: > See comments below. >=20 > > On Oct 21, 2015, at 5:33 PM, Chester Chen = wrote: > > > > All, > > > > just to see if this happens to other as well. > > > > This is tested against the > > > > spark 1.5.1 ( branch 1.5 with label 1.5.2-SNAPSHOT with commit = on Tue Oct 6, 84f510c4fa06e43bd35e2dc8e1008d0590cbe266) > > > > Spark deployment mode : Spark-Cluster > > > > Notice that if we enable Kerberos mode, the spark yarn client = fails with the following: > > > > Could not initialize class org.apache.hadoop.hive.ql.metadata.Hive > > java.lang.NoClassDefFoundError: Could not initialize class = org.apache.hadoop.hive.ql.metadata.Hive > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native = Method) > > at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 57) > > at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at = org.apache.spark.deploy.yarn.Client$.org$apache$spark$deploy$yarn$Client$$= obtainTokenForHiveMetastore(Client.scala:1252) > > at = org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:271= ) > > at = org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.sc= ala:629) > > at = org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:119) > > at org.apache.spark.deploy.yarn.Client.run(Client.scala:907) > > > > > > Diving in Yarn Client.scala code and tested against different = dependencies and notice the followings: if the kerberos mode is = enabled, Client.obtainTokenForHiveMetastore() will try to use scala = reflection to get Hive and HiveConf and method on these method. > > > > val hiveClass =3D = mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive") > > val hive =3D hiveClass.getMethod("get").invoke(null) > > > > val hiveConf =3D hiveClass.getMethod("getConf").invoke(hive) > > val hiveConfClass =3D = mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf") > > > > val hiveConfGet =3D (param: String) =3D> Option(hiveConfClass > > .getMethod("get", classOf[java.lang.String]) > > .invoke(hiveConf, param)) > > > > If the "org.spark-project.hive" % "hive-exec" % "1.2.1.spark" is = used, then you will get above exception. But if we use the > > "org.apache.hive" % "hive-exec" "0.13.1-cdh5.2.0" > > The above method will not throw exception. > > > > Here some questions and comments > > 0) is this a bug ? >=20 > I=E2=80=99m not an expert on this, but I think this might not be a = bug. > The Hive integration was redone for 1.5.0, see = https://issues.apache.org/jira/browse/SPARK-6906 > and I think Spark just needs to be compiled against 1.2.1 >=20 >=20 > > > > 1) Why spark-hive hive-exec behave differently ? I understand = spark-hive hive-exec has less dependencies > > but I would expect it functionally the same >=20 > I don=E2=80=99t know. >=20 > > 2) Where I can find the source code for spark-hive hive-exec ? >=20 > I don=E2=80=99t know. >=20 > > > > 3) regarding the method obtainTokenForHiveMetastore(), > > I would assume that the method will first check if the = hive-metastore uri is present before > > trying to get the hive metastore tokens, it seems to invoke the = reflection regardless the hive service in the cluster is enabled or not. >=20 > Checking to see if the hive-megastore.uri is present before trying to = get a delegation token would be an improvement. > Also checking to see if we are running in cluster mode would be good, = too. > I will file a JIRA and make these improvements. >=20 > > 4) Noticed the obtainTokenForHBase() in the same Class (Client.java) = catches > > case e: java.lang.NoClassDefFoundError =3D> logDebug("HBase Class = not found: " + e) > > and just ignore the exception ( log debug), > > but obtainTokenForHiveMetastore() does not catch = NoClassDefFoundError exception, I guess this is the problem. > > private def obtainTokenForHiveMetastore(conf: Configuration, = credentials: Credentials) { > > // rest of code > > } catch { > > case e: java.lang.NoSuchMethodException =3D> { logInfo("Hive = Method not found " + e); return } > > case e: java.lang.ClassNotFoundException =3D> { logInfo("Hive = Class not found " + e); return } > > case e: Exception =3D> { logError("Unexpected Exception " + e) > > throw new RuntimeException("Unexpected exception", e) > > } > > } > > } >=20 > I tested the code against different scenarios, it possible that I = missed the case where the class was not found. > obtainTokenForHBase() was implemented after obtainTokenForHive(). >=20 > Cheers, >=20 > Doug >=20 >=20 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org For additional commands, e-mail: dev-help@spark.apache.org