hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aneela Saleem <ane...@platalytics.com>
Subject Re: submitting spark job with kerberized HBase issue
Date Wed, 10 Aug 2016 09:53:19 GMT
And I'm using Apache distribution of Spark not Cloudera.

On Wed, Aug 10, 2016 at 12:06 PM, Aneela Saleem <aneela@platalytics.com>
wrote:

> Thanks Nkechi,
>
> I added this dependency as an external jar, when i compile the code,
> unfortunately i got the following error:
>
> error: object cloudera is not a member of package com
> [ERROR] import com.cloudera.spark.hbase.HBaseContext
>
>
>
> On Tue, Aug 9, 2016 at 7:51 PM, Nkechi Achara <nkachara@googlemail.com>
> wrote:
>
>> hi,
>>
>> Due to the fact we are not on Hbase 2.00 we are using SparkOnHbase.
>>
>> Dependency:
>> <dependency>
>>             <groupId>com.cloudera</groupId>
>>             <artifactId>spark-hbase</artifactId>
>>             <version>0.0.2-clabs</version>
>>         </dependency>
>>
>> It is quite a small snippet of code. For a general scan using a start and
>> stop time as the scan time range.
>>
>>     val conf = new SparkConf().
>>       set("spark.shuffle.consolidateFiles", "true").
>>       set("spark.kryo.registrationRequired", "false").
>>       set("spark.serializer", "org.apache.spark.serializer.K
>> ryoSerializer").
>>       set("spark.kryoserializer.buffer", "30m").
>>       set("spark.shuffle.spill", "true").
>>       set("spark.shuffle.memoryFraction", "0.4")
>>
>>      val sc = new SparkContext(conf)
>>
>>       val scan = new Scan()
>>       scan.addColumn(columnName, "column1")
>>       scan.setTimeRange(scanRowStartTs, scanRowStopTs)
>>       hc.hbaseRDD(inputTableName,scan,filter)
>>
>> To run just use the following:
>>
>> spark-submit --class ClassName --master yarn-client --driver-memory
>> 2000M --executor-memory 5G --keytab <location of keytab> --principal
>> <location of principal>
>>
>> That should work in a general way. Obviously you can utilise other scan /
>> put / gets etc methods.
>>
>> Thanks,
>>
>> Nkechi
>>
>> On 9 August 2016 at 15:20, Aneela Saleem <aneela@platalytics.com> wrote:
>>
>> > Thanks Nkechi,
>> >
>> > Can you please direct me to some code snippet with hbase on spark
>> module?
>> > I've been trying that for last few days but did not found a workaround.
>> >
>> >
>> >
>> > On Tue, Aug 9, 2016 at 6:13 PM, Nkechi Achara <nkachara@googlemail.com>
>> > wrote:
>> >
>> > > Hey,
>> > >
>> > > Have you tried hbase on spark module, or the spark-hbase module to
>> > connect?
>> > > The principal and keytab options should work out of the box for
>> > kerberized
>> > > access. I can attempt your code if you don't have the ability to use
>> > those
>> > > modules.
>> > >
>> > > Thanks
>> > > K
>> > >
>> > > On 9 Aug 2016 2:25 p.m., "Aneela Saleem" <aneela@platalytics.com>
>> wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > > I'm trying to connect to Hbase with security enabled using spark
>> job. I
>> > > > have kinit'd from command line. When i run the following job i.e.,
>> > > >
>> > > > /usr/local/spark-2/bin/spark-submit --keytab
>> > > /etc/hadoop/conf/spark.keytab
>> > > > --principal spark/hadoop-master@platalyticsrealm --class
>> > > > com.platalytics.example.spark.App --master yarn
>> --driver-class-path
>> > > > /root/hbase-1.2.2/conf /home/vm6/project-1-jar-with-d
>> ependencies.jar
>> > > >
>> > > > I get the error:
>> > > >
>> > > > 2016-08-07 20:43:57,617 WARN
>> > > > [hconnection-0x24b5fa45-metaLookup-shared--pool2-t1]
>> > > > ipc.RpcClientImpl: Exception encountered while connecting to the
>> > server :
>> > > > javax.security.sasl.SaslException: GSS initiate failed [Caused by
>> > > > GSSException: No valid credentials provided (Mechanism level:
>> Failed to
>> > > > find any Kerberos tgt)] 2016-08-07 20:43:57,619 ERROR
>> > > > [hconnection-0x24b5fa45-metaLookup-shared--pool2-t1]
>> > ipc.RpcClientImpl:
>> > > > SASL authentication failed. The most likely cause is missing or
>> invalid
>> > > > credentials. Consider 'kinit'. javax.security.sasl.SaslException:
>> GSS
>> > > > initiate failed [Caused by GSSException: No valid credentials
>> provided
>> > > > (Mechanism level: Failed to find any Kerberos tgt)] at
>> > > > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(
>> > > > GssKrb5Client.java:212)
>> > > > at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(
>> > > > HBaseSaslRpcClient.java:179)
>> > > > at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.
>> > > > setupSaslConnection(RpcClientImpl.java:617)
>> > > > at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.
>> > > > access$700(RpcClientImpl.java:162) at org.apache.hadoop.hbase.ipc.
>> > > > RpcClientImpl$Connection$2.run(RpcClientImpl.java:743)
>> > > >
>> > > > Following is my code:
>> > > >
>> > > > System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
>> > > >  System.setProperty("java.security.auth.login.config",
>> > > > "/etc/hbase/conf/zk-jaas.conf");
>> > > >
>> > > >   val hconf = HBaseConfiguration.create()
>> > > >   val tableName = "emp"
>> > > >   hconf.set("hbase.zookeeper.quorum", "hadoop-master")
>> > > >   hconf.set(TableInputFormat.INPUT_TABLE, tableName)
>> > > >   hconf.set("hbase.zookeeper.property.clientPort", "2181")
>> > > >   hconf.set("hadoop.security.authentication", "kerberos")
>> > > >   hconf.set("hbase.security.authentication", "kerberos")
>> > > >   hconf.addResource(new Path("/etc/hbase/conf/core-site.xml"))
>> > > >   hconf.addResource(new Path("/etc/hbase/conf/hbase-site.xml"))
>> > > >   UserGroupInformation.setConfiguration(hconf)
>> > > >   val keyTab = "/etc/hadoop/conf/spark.keytab"
>> > > >   val ugi = UserGroupInformation.loginUserFromKeytabAndReturnUG
>> > > > I("spark/hadoop-master@platalyticsrealm", keyTab)
>> > > >   UserGroupInformation.setLoginUser(ugi)
>> > > >   ugi.doAs(new PrivilegedExceptionAction[Void]() {
>> > > >    override def run(): Void = {
>> > > >     val conf = new SparkConf
>> > > >     val sc = new SparkContext(conf)
>> > > >     sc.addFile(keyTab)
>> > > >     var hBaseRDD = sc.newAPIHadoopRDD(hconf,
>> classOf[TableInputFormat],
>> > > >      classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
>> > > >      classOf[org.apache.hadoop.hbase.client.Result])
>> > > >     println("Number of Records found : " + hBaseRDD.count())
>> > > >     hBaseRDD.foreach(x => {
>> > > >      println(new String(x._2.getRow()))
>> > > >     })
>> > > >     sc.stop()
>> > > >     return null
>> > > >    }
>> > > >   })
>> > > >
>> > > > Please have a look. And help me try finding the issue.
>> > > >
>> > > > Thanks
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message