hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Harkenrider <nathan.harkenri...@gmail.com>
Subject Data Loss During Bulk Load
Date Sun, 21 Mar 2010 22:50:31 GMT
Hi All,

I'm currently running into data loss issues when bulk loading data into
HBase. I'm loading data via a Map/Reduce job that is parsing XML and
inserting rows into 2 HBase tables. The job is currently configured to run
30 mappers concurrently (3 per node) and is inserting at a rate of
approximately 6000 rows/sec. The Map/Reduce job appears to run correctly,
however, when I run the HBase rowcounter job on the tables afterwards the
row count is less than expected. The data loss is small percentage wise
(~200,000 rows out of 80,000,000) but concerning nevertheless.

I managed to locate the following errors in the regionserver logs related to
failed compactions and/or splits.

I'm running HBase 0.20.3 and Cloudera CDH2, on CentOS 5.4. The cluster is
comprised of 11 machines, 1 master and 10 region servers. Each machine is 8
cores, 8GB ram. A

Any advice is appreciated. Thanks,

Nathan Harkenrider

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message