hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Sautins <andy.saut...@returnpath.net>
Subject Performance of region merges...
Date Mon, 28 Mar 2011 00:06:19 GMT

    I have an issue I'm hoping to get some insight into.  We currently have a table that has
roughly 18k regions.  When we originally created the table we didn't realize we should make
the regions bigger  and have subsequently changed MAX_FILESIZE to something larger.  We are
no longer rapidly creating new regions, but we still have the large number of regions.  I've
been investigating using the merge tool to try to reduce the number of regions to something
more reasonable for our needs.  The issue I've run into is that the merge tool seems to run
somewhat slowly.  On a test table that has a sample of the data in our main table I have roughly
8MM rows each approximately 1k across 48 regions.  Using the merge tool I can reduce the number
of regions down to 24 by running the merge tool over pairs of regions and all seems to work
well.  However, for those 48 regions it takes roughly 30 minutes.  It's not the end of the
world for this table if it takes a while, but given the fact that the cluster needs to be
offline when using the merge tool merging has a larger impact that I'd like it to have.

    I guess the question I have is if I have a lot more regions than I want is there a way
to merge the regions down to a smaller number in a reasonably efficient manner.  Can I run
the merge tool on multiple regions at the same time?  Are there alternatives to the merge
tool?  Could I export/import the data or some other method?

   We are currently running 0.90.1.

   Any insights would be much appreciated.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message