hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Calvin <calvin.li...@gmail.com>
Subject Re: hbase bulk writes
Date Mon, 30 Nov 2009 23:33:25 GMT
Thanks for the responses.  If I can avoid writing a map-reduce job that
would be preferable (getting map-reduce to work with / depend on my existing
infrastructure is turning out to be annoying).

I have no good way of randomizing my dataset since it's a very large stream
of sequential data (ordered by some key).  I have a fair number of column
families (~25) and every column is a long or a double.  Having a standalone
program that writes rows using the HTable / Put API seems to run at ~2-5000
rows/sec, which seems ridiculously slow.  Is it possible I am doing
something terribly wrong?

-Calvin

On Mon, Nov 30, 2009 at 5:47 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:

> Sequentially ordered rows is the worst insert case in HBase - you end
> up writing all to 1 server even if you have 500.  If you could
> randomize your input, and I have pasted a Randomize.java map reduce
> that will randomize lines of a file, then your performance will
> improve.
>
> I have seen sustained inserts of 100-300k rows/sec on small rows
> before.  Obviously large blob rows will be slower, since the limiting
> factor is how fast we can write data to HDFS, thus it isnt the actual
> row count, but the amount of data involved.
>
> Try the randomize.java, see where that gets you. I think it's on the
> list archives.
>
> -ryan
>
>
> On Mon, Nov 30, 2009 at 2:41 PM, Jean-Daniel Cryans <jdcryans@apache.org>
> wrote:
> > Could you put your data in HDFS and load it from there with a MapReduce
> job?
> >
> > J-D
> >
> > On Mon, Nov 30, 2009 at 2:33 PM, Calvin <calvin.lists@gmail.com> wrote:
> >> I have a large amount of sequential ordered rows I would like to write
> to an
> >> HBase table.  What is the preferred way to do bulk writes of
> multi-column
> >> tables in HBase?  Using the get/put interface seems fairly slow even if
> I
> >> bulk writes with table.put(List<Put>).
> >>
> >> I have followed the directions on:
> >>   * http://wiki.apache.org/hadoop/PerformanceTuning
> >>   *
> >>
> http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html
> >>
> >> Are there any other resources for improving the throughput of my bulk
> >> writes?  On
> >>
> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.htmlI
> >> see there's a way to write HFiles directly, but HFileOutputFormat can
> >> only
> >> write a single column famly at a time (
> >> https://issues.apache.org/jira/browse/HBASE-1861).
> >>
> >> Thanks!
> >>
> >> -Calvin
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message