hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Khatri <chetan.opensou...@gmail.com>
Subject Re: Writing/Importing large number of records into HBase
Date Sat, 28 Jan 2017 03:47:17 GMT
Adding to @Ted Check Bulk Put Example -
https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseBulkPutExampleFromFile.scala

On Sat, Jan 28, 2017 at 9:11 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> Have you looked at hbase-spark module (currently in master branch) ?
>
> See hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/
> example/datasources/AvroSource.scala
> and hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/
> DefaultSourceSuite.scala
> for examples.
>
> There may be other options.
>
> FYI
>
> On Fri, Jan 27, 2017 at 7:28 PM, jeff saremi <jeffsaremi@hotmail.com>
> wrote:
>
> > Hi
> > I'm seeking some pointers/guidance on what we could do to insert billions
> > of records that we already have in avro files in hadoop into HBase.
> >
> > I read some articles online and one of them recommended using HFile
> > format. I took a cursory look at the documentation for that. Given the
> > complexity of that I think that may be the last resort we want to pursue.
> > Unless some library is out there that easily helps us write our files
> into
> > that format. I didn't see any.
> > Assuming that the Hbase native client may be our best bet, is there any
> > advice around pre-paritioning our records or such techniques that we
> could
> > use?
> > thanks
> >
> > Jeff
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message