hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: question about SparkSQL loading hbase tables
Date Wed, 29 Jun 2016 03:23:33 GMT
There is no hbase release with full support for SparkSQL yet.
For #1, the classes / directories are (master branch):

./hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext
./hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext

hbase-spark/src/main/scala/org/apache/spark/sql/datasources/hbase/HBaseTableCatalog.scala

./hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/datasources/HBaseSparkConf.scala

For documentation, see HBASE-15473.


On Tue, Jun 28, 2016 at 7:13 PM, 罗辉 <luohui@ifeng.com> wrote:

> Hi there
>
>      I am using SparkSQL to read from hbase, however
>
> 1.       I find some API not available in my dependencies. Where to add
> them:
>
> org.apache.hadoop.hbase.spark.example.hbasecontext
>
> org.apache.spark.sql.datasources.hbase.HBaseTableCatalog
>
> org.apache.hadoop.hbase.spark.datasources.HBaseSparkConf
>
> 2.       Is there a complete example code about how to use SparkSQL
> read/write from hbase?
>
> The document I refered is this:
> http://hbase.apache.org/book.html#_sparksql_dataframes. It seems that
> this is a snapshot for 2.0, while I am using hbase 1.2.1 + spark1.6.1 +
> hadoop2.7.1.
>
>
>
> In my App, I want to load the entire hbase table into sparksql
>
> My code:
>
>
>
> import org.apache.spark._
>
> import org.apache.hadoop.hbase._
>
> import org.apache.hadoop.hbase.HBaseConfiguration
>
> import org.apache.hadoop.hbase.spark.example.hbasecontext
>
> import org.apache.spark.sql.datasources.hbase.HBaseTableCatalog
>
> import org.apache.hadoop.hbase.spark.datasources.HBaseSparkConf
>
>
>
> object HbaseConnector {
>
>   def main(args: Array[String]) {
>
>     val tableName = args(0)
>
>     val sparkMasterUrlDev = "spark:// hadoopmaster:7077"
>
>     val sparkMasterUrlLocal = "local[2]"
>
>
>
>     val sparkConf = new SparkConf().setAppName("HbaseConnector for table "
> + tableName).setMaster(sparkMasterUrlDev).set("spark.executor.memory",
> "10g")
>
>     val sc = new SparkContext(sparkConf)
>
>     val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>
>     val conf = new HBaseConfiguration()
>
>     conf.set("hbase.zookeeper.quorum", "z1,z2,z3")
>
>     conf.set("hbase.zookeeper.property.clientPort", "2181")
>
>     conf.set("hbase.rootdir", "hdfs://hadoopmaster:8020/hbase")
>
>     //    val hbaseContext = new HBaseContext(sc, conf)
>
>
>
>     val pv = sqlContext.read.options(Map(HBaseTableCatalog.tableCatalog ->
> writeCatalog, HBaseSparkConf.TIMESTAMP -> tsSpecified.toString))
>
>       .format("org.apache.hadoop.hbase.spark")
>
>       .load()
>
>     pv.write.saveAsTable(tableName)
>
>
>
>   }
>
>
>
> }
>
>
>
> My POM file is attached as well.
>
>
>
> Thanks for a help.
>
>
>
> San.Luo
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message