hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Spark HBase Bulk load using HFileFormat
Date Thu, 14 Jul 2016 19:45:53 GMT
Please take a look at http://hbase.apache.org/book.html#dm.sort

In your second example, the column qualifier of current cell was not in
proper order.

On Thu, Jul 14, 2016 at 12:13 PM, yeshwanth kumar <yeshwanth43@gmail.com>
wrote:

> Hi  ,
>
> i have few questions regarding BulkLoad,
> does the Rows needs to be in sorted order or, the KeyValues  in the row
> needs to be in sorted order?
>
> sometimes i see exception between two different rowkeys, sometime i see
> exception between keyvalue pairs of same rowkey.
>
> for example
>
> current cell
>
> 123B51E8-574E-4029-BEA7-D0FF7B12DB30/C:Address/1468510623407/Put/vlen=176/seqid=0,
> lastCell =
>
> 694E24E2-7484-4926-B587-466990F1A017/C:Year/1468510623407/Put/vlen=4/seqid=0
>
> order mismatch is in between Keyvalues in two different rows,
>
> whereas
>
>  Current cell =
> 200065494/C:GENERALDEPENDENCYMEDIUM/1468522415075/Put/vlen=176/seqid=0,
>  lastCell = 200065494/C:R.PAIDPREP/1468522415075/Put/vlen=10/seqid=0
>
> order mismatch  is in between keyvalues of same row.
> in which sorted order HFileFormat is expecting the Data.??
>
>
>
>
>
>
>
> -Yeshwanth
> Can you Imagine what I would do if I could do all I can - Art of War
>
> On Thu, Jul 14, 2016 at 1:33 AM, yeshwanth kumar <yeshwanth43@gmail.com>
> wrote:
>
> >
> > following is the code snippet for saveASHFile
> >
> > def saveAsHFile(putRDD: RDD[(ImmutableBytesWritable, KeyValue)],
> outputPath: String) = {
> >   val conf = ConfigFactory.getConf
> >   val job = Job.getInstance(conf, "HBaseBulkPut")
> >   job.setMapOutputKeyClass(classOf[ImmutableBytesWritable])
> >   job.setMapOutputValueClass(classOf[Put])
> >   val connection = ConnectionFactory.createConnection(conf)
> >   val stTable= connection.getTable(TableName.valueOf("strecords"))
> >   val regionLocator = new HRegionLocator(TableName.valueOf("strecords"),
> connection.asInstanceOf[ClusterConnection])
> >   HFileOutputFormat2.configureIncrementalLoad(job, stTable,
> regionLocator)
> >
> >   putRDD.saveAsNewAPIHadoopFile(
> >     outputPath,
> >     classOf[ImmutableBytesWritable],
> >     classOf[Put],
> >     classOf[HFileOutputFormat2],
> >     conf)
> > }
> >
> > i just saw that i am using   job.setMapOutputValueClass(classOf[Put])
> >
> > where as i am writing KeyValue, does that cause any issue?
> >
> > i will update the code and will run it,
> >
> > can you suggest me sorting on partitions.
> >
> > Thanks,
> >
> > Yeshwanth
> >
> >
> > -Yeshwanth
> > Can you Imagine what I would do if I could do all I can - Art of War
> >
> > On Wed, Jul 13, 2016 at 7:46 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> >> Can you show the code inside saveASHFile ?
> >>
> >> Maybe the partitions of the RDD need to be sorted (for 1st issue).
> >>
> >> Cheers
> >>
> >> On Wed, Jul 13, 2016 at 4:29 PM, yeshwanth kumar <yeshwanth43@gmail.com
> >
> >> wrote:
> >>
> >> > Hi i am doing bulk load into HBase as HFileFormat, by
> >> > using saveAsNewAPIHadoopFile
> >> >
> >> > i am on HBase 1.2.0-cdh5.7.0 and spark 1.6
> >> >
> >> > when i try to write i am getting an exception
> >> >
> >> >  java.io.IOException: Added a key not lexically larger than previous.
> >> >
> >> > following is the code snippet
> >> >
> >> > case class HBaseRow(rowKey: ImmutableBytesWritable, kv: KeyValue)
> >> >
> >> > val kAvroDF =
> >> > sqlContext.read.format("com.databricks.spark.avro").load(args(0))
> >> > val kRDD = kAvroDF.select("seqid", "mi", "moc", "FID", "WID").rdd
> >> > val trRDD = kRDD.map(a => preparePUT(a(1).asInstanceOf[String],
> >> > a(3).asInstanceOf[String]))
> >> > val kvRDD = trRDD.flatMap(a => a).map(a => (a.rowKey, a.kv))
> >> > saveAsHFile(kvRDD, args(1))
> >> >
> >> >
> >> > prepare put returns a list of HBaseRow(
> ImmutableBytesWritable,KeyValue)
> >> > sorted on KeyValue, where i do a flat map on the rdd and
> >> > prepare a RDD(ImmutableBytesWritable,KeyValue) and pass it to
> >> saveASHFile
> >> >
> >> > i tried using Put api,
> >> > it throws
> >> >
> >> > java.lang.Exception: java.lang.ClassCastException:
> >> > org.apache.hadoop.hbase.client.Put cannot be cast to
> >> > org.apache.hadoop.hbase.Cell
> >> >
> >> >
> >> > is there any i can skip using KeyValue Api,
> >> > and do the bulk load into HBase?
> >> > please help me in resolving this issue,
> >> >
> >> > Thanks,
> >> > -Yeshwanth
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message