hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: createTable with specified region splits: works great
Date Tue, 15 Feb 2011 18:54:47 GMT
That's a great report Matt, thanks for sharing!


On Tue, Feb 15, 2011 at 10:52 AM, Matt Wheeler
<matt.wheeler@explorysmedical.com> wrote:
> Pre-creating regions using the byte[][] overload of createTable more or less doubled
the performance of our main index table generation.  Our keys start with hashes of the original
record IDs, so the data can be evenly distributed between all regions.  The keys are ASCII
strings starting with the hash value in hexadecimal, so we specify split keys as zero-padded
ASCII strings with equal length.
> We try to select an initial region count that will avoid any region splits during the
index MR job, without making the table larger than it needs to be.  Performance suffered
when we created the table with about 3 times more regions than necessary.
> - matt

View raw message