spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wojciech Indyk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-14115) YARN + SparkSql + Hive + HBase + Kerberos doesn't work
Date Thu, 24 Mar 2016 15:12:25 GMT

    [ https://issues.apache.org/jira/browse/SPARK-14115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210366#comment-15210366
] 

Wojciech Indyk commented on SPARK-14115:
----------------------------------------

And the very important information. It doesn't work on yarn-cluster mode. In local master
is fine for HBase-handler 2.0.0 (I've updated the title)

> YARN + SparkSql + Hive + HBase + Kerberos doesn't work
> ------------------------------------------------------
>
>                 Key: SPARK-14115
>                 URL: https://issues.apache.org/jira/browse/SPARK-14115
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>         Environment: Spark 1.6.1. compiled with hadoop 2.7.1, yarn, hive
> Hadoop 2.7.1 (HDP 2.3)
> Hive 1.2.1 (HDP 2.3)
> Kerberos
>            Reporter: Wojciech Indyk
>
> When I try to run SparkSql on hive, where table is defined by HBaseHandler I have an
error:
> {code}
> ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.security.token.TokenUtil.addTokenForJob(Lorg/apache/hadoop/hbase/client/HConnection;Lorg/apache/hadoop/hbase/security/User;Lorg/apache/hadoop/mapreduce/Job;)V
> java.lang.NoSuchMethodError: org.apache.hadoop.hbase.security.token.TokenUtil.addTokenForJob(Lorg/apache/hadoop/hbase/client/HConnection;Lorg/apache/hadoop/hbase/security/User;Lorg/apache/hadoop/mapreduce/Job;)V
> 	at org.apache.hadoop.hive.hbase.HBaseStorageHandler.addHBaseDelegationToken(HBaseStorageHandler.java:482)
> 	at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureTableJobProperties(HBaseStorageHandler.java:427)
> 	at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureInputJobProperties(HBaseStorageHandler.java:328)
> 	at org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:304)
> 	at org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:323)
> 	at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:276)
> 	at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:276)
> 	at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
> 	at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
> 	at scala.Option.map(Option.scala:145)
> 	at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
> 	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> 	at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91)
> 	at org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:220)
> 	at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254)
> 	at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248)
> 	at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
> 	at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> 	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> 	at org.apache.spark.sql.execution.Sort.doExecute(Sort.scala:64)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> 	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> 	at org.apache.spark.sql.execution.ConvertToSafe.doExecute(rowFormatConverters.scala:56)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> 	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> 	at org.apache.spark.sql.execution.aggregate.SortBasedAggregate$$anonfun$doExecute$1.apply(SortBasedAggregate.scala:72)
> 	at org.apache.spark.sql.execution.aggregate.SortBasedAggregate$$anonfun$doExecute$1.apply(SortBasedAggregate.scala:69)
> 	at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
> 	at org.apache.spark.sql.execution.aggregate.SortBasedAggregate.doExecute(SortBasedAggregate.scala:69)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> 	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> 	at org.apache.spark.sql.execution.ConvertToUnsafe.doExecute(rowFormatConverters.scala:38)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> 	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> 	at org.apache.spark.sql.execution.columnar.InMemoryRelation.buildBuffers(InMemoryColumnarTableScan.scala:129)
> 	at org.apache.spark.sql.execution.columnar.InMemoryRelation.<init>(InMemoryColumnarTableScan.scala:118)
> 	at org.apache.spark.sql.execution.columnar.InMemoryRelation$.apply(InMemoryColumnarTableScan.scala:41)
> 	at org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:93)
> 	at org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:60)
> 	at org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:84)
> 	at org.apache.spark.sql.DataFrame.persist(DataFrame.scala:1581)
> 	at org.apache.spark.sql.DataFrame.cache(DataFrame.scala:1590)
> 	at pl.com.agora.bigdata.recommendations.ranking.App$$anon$1.run(App.scala:40)
> 	at pl.com.agora.bigdata.recommendations.ranking.App$$anon$1.run(App.scala:35)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:360)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
> 	at pl.com.agora.bigdata.recommendations.ranking.App$.main(App.scala:35)
> 	at pl.com.agora.bigdata.recommendations.ranking.App.main(App.scala)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
> {code}
> When I query simmilar table defined as Avro on HDFS all is OK.
> I tried to use a newer version of Hive-HBase-handler that has missing method. I used
Hive-HBase-handler 2.0.0 and have another issue with Kerberos on worker machines:
> at the beginning I have:
> {code}
> 16/03/24 15:26:11 DEBUG UserGroupInformation: User entry: "test-dataocean"
> 16/03/24 15:26:11 DEBUG UserGroupInformation: UGI loginUser:test-dataocean (auth:KERBEROS)
> 16/03/24 15:26:11 DEBUG UserGroupInformation: PrivilegedAction as:test-dataocean (auth:SIMPLE)
from:org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
> {code}
> then
> {code}
> 16/03/24 15:26:45 DEBUG AbstractRpcClient: RPC Server Kerberos principal name for service=ClientService
is hbase/abc.abc.abc@abc.abc
> 16/03/24 15:26:45 DEBUG AbstractRpcClient: Use KERBEROS authentication for service ClientService,
sasl=true
> 16/03/24 15:26:45 DEBUG AbstractRpcClient: Connecting to abc.abc.abc/abc.abc.abc:16020
> 16/03/24 15:26:45 DEBUG UserGroupInformation: PrivilegedAction as:test-dataocean (auth:SIMPLE)
from:org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:735)
> 16/03/24 15:26:45 DEBUG HBaseSaslRpcClient: Creating SASL GSSAPI client. Server's Kerberos
principal name is hbase/abc.abc.abc@abc.abc
> 16/03/24 15:26:45 DEBUG UserGroupInformation: PrivilegedActionException as:test-dataocean
(auth:SIMPLE) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException:
No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
> 16/03/24 15:26:45 DEBUG UserGroupInformation: PrivilegedAction as:test-dataocean (auth:SIMPLE)
from:org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.handleSaslConnectionFailure(RpcClientImpl.java:638)
> 16/03/24 15:26:45 WARN AbstractRpcClient: Exception encountered while connecting to the
server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No
valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
> 16/03/24 15:26:45 ERROR AbstractRpcClient: SASL authentication failed. The most likely
cause is missing or invalid credentials. Consider 'kinit'.
> javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid
credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
> 	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
> 	at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
> 	at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:612)
> 	at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:157)
> 	at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:738)
> 	at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:735)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 	at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:735)
> 	at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:897)
> 	at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:866)
> 	at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$CallSender.run(RpcClientImpl.java:267)
> Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find
any Kerberos tgt)
> 	at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
> 	at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
> 	at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
> 	at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
> 	at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
> 	at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
> 	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
> 	... 12 more
> {code}
> The problem is the method of authentication is overwritten as SIMPLE instead of KERBEROS
> I submit a job using parameters:
> {code}
> --principal test-dataocean --keytab /etc/security/keytabs/test-dataocean.keytab
> {code}
> and setup required parameters regarding Kerberos for Hadoop, Hive and HBase. All that
configs work for SparkSql on Hive (on HDFS), HBase (without Spark), but not for SparkSql on
Hive on HBase (HBaseHandler 2.0.0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message