hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster
Date Wed, 24 Feb 2016 00:51:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159982#comment-15159982
] 

Jonathan Hsieh commented on HBASE-15184:
----------------------------------------

I tested the patch on a live kerborized cluster and it works for me.    Here's how I did it
for folks who'd like to duplicate:

Prereqs:
# must have a kerb enabled cluster (hbase/hdfs/yarn, etc).
# spark must be run in yarn continainers (kerb doesn't work with spark standalone mode).

Procedure: 
# Loaded a table with 100k rows.  'hbase ltt -write 5:1000:160 -num_keys 100000 -tn ltt'
# Granted 'R' access to 'randomuser' user (yarn need to have a user with id >1000).  "grant
'randomuser', 'R', 'ltt'" in the hbase shell.
# Started spark-shell with hbase classpath: 'sudo -u randomuser SPARK_CLASSPATH=`hbase classpath`
spark-shell
# ran these lines in the spark shell
{code}

import org.apache.hadoop.hbase.spark.HBaseContext
import org.apache.hadoop.hbase.{TableName, HBaseConfiguration}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.client.Scan
import org.apache.spark.sql.SQLContext
val tableName="ltt"
val hbaseConf = HBaseConfiguration.create()
val hbaseContext = new HBaseContext(sc, hbaseConf)
val scan = new Scan()
scan.setCaching(100)
val getRdd = hbaseContext.hbaseRDD(TableName.valueOf(tableName), scan)
getRdd.foreach(v => println(Bytes.toString(v._1.get())))
println("Length: " + getRdd.map(r => r._1.copyBytes()).collect().length);
{code}
# got 100k count, declare victory



> SparkSQL Scan operation doesn't work on kerberos cluster
> --------------------------------------------------------
>
>                 Key: HBASE-15184
>                 URL: https://issues.apache.org/jira/browse/HBASE-15184
>             Project: HBase
>          Issue Type: Bug
>          Components: spark
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15184.1.patch, HBaseSparkModule.zip
>
>
> I was using the HBase Spark Module at a client with Kerberos and I ran into an issue
with the Scan.  
> I made a fix for the client but we need to put it back into HBase.  I will attach my
solution, but it has a major problem.  I had to over ride a protected class in spark.  I will
need help to decover a better approach



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message