spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aniruddh Tiwari (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-5356) Write to Hbase from Spark
Date Wed, 21 Jan 2015 20:28:34 GMT
Aniruddh Tiwari created SPARK-5356:
--------------------------------------

             Summary: Write to Hbase from Spark
                 Key: SPARK-5356
                 URL: https://issues.apache.org/jira/browse/SPARK-5356
             Project: Spark
          Issue Type: Question
          Components: Examples, Spark Shell
    Affects Versions: 1.1.0
         Environment: Linux
            Reporter: Aniruddh Tiwari


I am able to Read in Hbase from Spark, but I am not able to write rows in Hbase from Spark.
I am on Cloudera 5.0 (Spark 1.1.0 and HBase 0.98.6) . So Far this is what I got.

I have a RDD localData, how can save that to Hbase, how can I use saveAsHadoopDataset?
import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor}
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.spark.rdd.NewHadoopRDD
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.client.Result
import org.apache.hadoop.hbase.mapred.TableOutputFormat
import org.apache.hadoop.mapred.JobConf
//Create RDD
val localData = sc.textFile("/home/hbase_example/antiwari/scala_code/resources/scala_load_file.txt")
val conf = HBaseConfiguration.create()
conf.set("hbase.zookeeper.quorum", "localhost")
conf.set("hbase.zookeeper.property.clientPort","2181")
val jobConfig: JobConf = new JobConf(conf, this.getClass)
jobConfig.setOutputFormat(classOf[TableOutputFormat])
jobConfig.set(TableOutputFormat.OUTPUT_TABLE, "spark_data")
/*Contents of scala_load_file.txt
0000000001, Name01, Field1
0000000002, Name02, Field2
0000000003, Name03, Field3
0000000004, Name04, Field4
/*

I looked at many examples online including (http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/admin_hbase_import.html...
, i get the following error (may be because I am on spark 1.1.0 and this example is old)

scala> def convert(triple: (Int, String, String)) = {
| val p = new Put(Bytes.toBytes(triple._1))
| p.add(Bytes.toBytes("cf"),
| Bytes.toBytes("col_1"), Bytes.toBytes(triple._2))
| p.add(Bytes.toBytes("cf"),
| Bytes.toBytes("col_2"), Bytes.toBytes(triple._3))
| (new ImmutableBytesWritable, p)
| }
<console>:18: error: not found: type Put
val p = new Put(Bytes.toBytes(triple._1))




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message