hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Esteban Gutierrez <este...@cloudera.com>
Subject Re: HBASE mapReduce stoppage
Date Wed, 20 May 2015 23:24:30 GMT
Hi Dilon,

Sounds like your table was not pre-split from the behavior that you are
describing, but when you say that you are bulk loading the data using MR is
this a MR job that does Put(s) into HBase or just generating HFiles (if
using importtsv you have both options) that are later on bulk loaded via
the completebulkload command?

Have you looked into https://hbase.apache.org/book.html#arch.bulk.load for
how to perform a bulk load into HBase?

cheers,
esteban.

--
Cloudera, Inc.


On Wed, May 20, 2015 at 3:01 PM, <dchrimes@uvic.ca> wrote:

> We are bulk loading 1 billion rows into hbase. The 1 billion file was
> split into 20 files of ~22.5GB. Ingesting the file to hdfs took ~2min.
> Ingesting the first file to hbase took  ~3 hours. The next took ~5hours,
> then it is increasing. By the sixth or seventh file the ingestion just
> stops (mapReduce Bulk load stops at 99% of mapper and around 22% of the
> reducer). We also noticed that as soon as the reducers are starting, the
> progress of the job slows down.
>
> The logs did not show any problem and we do not see any hot spotting (the
> table is already salted). We are running out of ideas. Few questions to
> get started:
> 1- Is the increase MR expected? Does MR need to sort the new data again
> the already ingested one?
> 2- Is there a way to speed up this, especially that our data is already
> sorted? From 2min on hdfs to 5 hours on hbase is a big gap. A word count
> map reduce on 24GB took only ~7 minutes. Removing the reducers from the
> existing cvs bulk load will not help as the mappers will spit the data in
> a random order.
>
> regards,
>
> Dillon
>
> Dillon Chrimes (PhD)
> University of Victoria
> Victoria BC Canada
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message