hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yeshwanth kumar <yeshwant...@gmail.com>
Subject Spark HBase Bulk load using HFileFormat
Date Wed, 13 Jul 2016 23:29:13 GMT
Hi i am doing bulk load into HBase as HFileFormat, by
using saveAsNewAPIHadoopFile

i am on HBase 1.2.0-cdh5.7.0 and spark 1.6

when i try to write i am getting an exception

 java.io.IOException: Added a key not lexically larger than previous.

following is the code snippet

case class HBaseRow(rowKey: ImmutableBytesWritable, kv: KeyValue)

val kAvroDF = sqlContext.read.format("com.databricks.spark.avro").load(args(0))
val kRDD = kAvroDF.select("seqid", "mi", "moc", "FID", "WID").rdd
val trRDD = kRDD.map(a => preparePUT(a(1).asInstanceOf[String],
val kvRDD = trRDD.flatMap(a => a).map(a => (a.rowKey, a.kv))
saveAsHFile(kvRDD, args(1))

prepare put returns a list of HBaseRow( ImmutableBytesWritable,KeyValue)
sorted on KeyValue, where i do a flat map on the rdd and
prepare a RDD(ImmutableBytesWritable,KeyValue) and pass it to saveASHFile

i tried using Put api,
it throws

java.lang.Exception: java.lang.ClassCastException:
org.apache.hadoop.hbase.client.Put cannot be cast to

is there any i can skip using KeyValue Api,
and do the bulk load into HBase?
please help me in resolving this issue,


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message