hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Mackles <pmack...@adobe.com>
Subject RE: export/import for backup
Date Mon, 20 Feb 2012 21:58:16 GMT
Import was run as an M/R job on the same configuration as the export (15 nodes, 5 tasks per
node). Nodes are 8 cores with 23GB of total RAM (6GB for hbase RS). As far as I could tell,
everything was running pretty balanced and hbase was the bottleneck due to all of the compaction.

Actually, an hbase export to "bulk load" facility sounds like a great idea. We have been using
bulk loads to migrate data from an older data store and they have worked awesome for us. It
also doesn't seem like it would be that hard to implement. So what am I missing?


-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Monday, February 20, 2012 4:29 PM
To: user@hbase.apache.org
Subject: Re: export/import for backup

On Mon, Feb 20, 2012 at 1:20 PM, Paul Mackles <pmackles@adobe.com> wrote:
> We are on hbase 0.90.4 (cd3u2). We are using the standard hbase export/import for backups.
In a test run, our imports ran extremely slow. While a full export of our dataset took about
an hour, the corresponding import took 20+ hours (for 216 regions across 15 servers). While
it finished, I am a little uncomfortable with that sort of recovery time should disaster strike.
Are there any recommendations for speeding up imports in a recovery scenario? One thing I
noticed while watching the region-server logs was that there were a lot of compactions happening
during the import (both major and minor). Should we disable compactions while the import is
running and then do it all at the end? We have our region-size set to 100GB right now so we
can manage splitting. Thanks in advance for any recommendations.

Can you tell where it was spending the time Paul?  Upping config. so
less flushing sounds like it might good way to go.  You might want to
do stuff like large flush sizes when importing so flushes are larger.
How did you import?  A MR job?  It was running full on? HBase was what
was keeping it slow?

Anyone played with going from an export to a bulk load?  I wonder if
this would run faster?


View raw message