spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sudhir Jangir <sud...@infoobjects.com>
Subject One question / kerberos, yarn-cluster -> connection to hbase
Date Wed, 24 May 2017 17:07:18 GMT
Facing one issue with Kerberos enabled Hadoop/CDH cluster.

 

We are trying to run a streaming job on yarn-cluster, which interacts with Kafka (direct stream),
and hbase. 

 

Somehow, we are not able to connect to hbase in the cluster mode. We use keytab to login to
hbase. 

 

This is what we do:

spark-submit --master yarn-cluster --keytab "dev.keytab" --principal "dev@IO-INT.COM"  --conf
"spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j_executor_conf.properties -XX:+UseG1GC"
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j_driver_conf.properties -XX:+UseG1GC"
--conf spark.yarn.stagingDir=hdfs:///tmp/spark/ --files "job.properties,log4j_driver_conf.properties,log4j_executor_conf.properties"
service-0.0.1-SNAPSHOT.jar job.properties

 

To connect to hbase:

 def getHbaseConnection(properties: SerializedProperties): (Connection, UserGroupInformation)
= {

 

   

    val config = HBaseConfiguration.create();

    config.set("hbase.zookeeper.quorum", HBASE_ZOOKEEPER_QUORUM_VALUE);

    config.set("hbase.zookeeper.property.clientPort", 2181);

    config.set("hadoop.security.authentication", "kerberos");

    config.set("hbase.security.authentication", "kerberos");

    config.set("hbase.cluster.distributed", "true");

    config.set("hbase.rpc.protection", "privacy");

   config.set("hbase.regionserver.kerberos.principal", “hbase/_HOST@IO-INT.COM”);

    config.set("hbase.master.kerberos.principal", “hbase/_HOST@IO-INT.COM”);

 

    UserGroupInformation.setConfiguration(config);

      

     var ugi: UserGroupInformation = null;

      if (SparkFiles.get(properties.keytab) != null

        && (new java.io.File(SparkFiles.get(properties.keytab)).exists)) {

        ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(properties.kerberosPrincipal,

          SparkFiles.get(properties.keytab));

      } else {

        ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(properties.kerberosPrincipal,

          properties.keytab);

      }

    

 

    val connection = ConnectionFactory.createConnection(config);

    return (connection, ugi);

  }

 

and we connect to hbase:

 ….foreachRDD { rdd =>

      if (!rdd.isEmpty()) {

        //var ugi: UserGroupInformation = Utils.getHbaseConnection(properties)._2

        rdd.foreachPartition { partition =>

          val connection = Utils.getHbaseConnection(propsObj)._1

          val table = … 

          partition.foreach { json =>

            

          }

          table.put(puts)

          table.close()

          connection.close()

        }

      }

    }

 

 

Keytab file is not getting copied to yarn staging/temp directory, we are not getting that
in SparkFiles.get… and if we pass keytab with --files, spark-submit is failing because it’s
there in --keytab already. 

 

Thanks,

Sudhir


Mime
View raw message