hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hamed Ghavamnia <ghavamni...@gmail.com>
Subject Re: adding data
Date Sat, 04 Aug 2012 07:44:35 GMT
Hi,
I'm facing a somewhat similar problem. I need to insert 15000 rows per
second into hbase. I'm getting really bad results using the simple put
api's (with multithreading). I've tried map/reduce integration as well. The
problem seems to be the type of the row keys. My row keys have an
incremental type, which makes hbase store them in the same region and
therefore on the same node. I've tried changing my keys to a more random
type, but still hbase stores them in the same region.
Any solutions would be appreciated, some things which have crossed my mind:
1. To presplit my regions, but I'm not sure if the problem has anything to
do with the regions.
2. Use the bulk load stated in you emails, but I don't where to start from.
Do you have a link to a sample code which can be used?
Any ideas?

On Sat, Aug 4, 2012 at 10:09 AM, anil gupta <anilgupta84@gmail.com> wrote:

> Hi Rita,
>
> HBase Bulk Loader is a viable solution for loading such huge data set. Even
> if your import file has a separator other than tab you can use ImportTsv as
> long as the separator is single character. If in case you want to put in
> your business logic while writing the data to HBase then you can write your
> own mapper class and use it with bulk loader. Hence, you can heavily
> customize the bulk loader as per your needs.
> These links might be helpful for you:
> http://hbase.apache.org/book.html#arch.bulk.load
> http://bigdatanoob.blogspot.com/2012/03/bulk-load-csv-file-into-hbase.html
>
> HTH,
> Anil Gupta
>
> On Fri, Aug 3, 2012 at 9:54 PM, Bijeet Singh <bijeetsingh@gmail.com>
> wrote:
>
> > Well, if the file that you have contains TSV, you can directly use the
> > ImportTSV utility of HBase to do a bulk load.
> > More details about that can be found here :
> >
> > http://hbase.apache.org/book/ops_mgt.html#importtsv
> >
> > The other option for you is to run a MR job on the file that you have, to
> > generate the HFiles, which you can later import
> > to HBase using completebulkload.  HFiles are created using the
> > HFileOutputFormat class.The output of Map should
> > be Put or KeyValue. For Reduce you need to use configureIncrementalLoad
> > which sets up reduce tasks.
> >
> > Bijeet
> >
> >
> > On Sat, Aug 4, 2012 at 8:13 AM, Rita <rmorgan466@gmail.com> wrote:
> >
> > > I have a file which has 13 billion rows of key an value which I would
> > like
> > > to place in Hbase. I was wondering if anyone has a good example to
> > provide
> > > with mapreduce for some sort of work like this.
> > >
> > >
> > > tia
> > >
> > >
> > > --
> > > --- Get your facts first, then you can distort them as you please.--
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message