hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Performance of region merges...
Date Mon, 28 Mar 2011 03:17:39 GMT
It seems HMerge doesn't have main().
Refer to TestMergeTable.testMergeTable() for usage:
      HMerge.merge(c, FileSystem.get(c), desc.getName());

Cheers

On Sun, Mar 27, 2011 at 7:57 PM, Andy Sautins
<andy.sautins@returnpath.net>wrote:

>
>  Thank you Ted.  I had not heard of HMerge yet but will take a look.
>
>  I appreciate the help.
>
>   Andy
>
> -----Original Message-----
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Sunday, March 27, 2011 8:54 PM
> To: user@hbase.apache.org
> Subject: Re: Performance of region merges...
>
> Merge.java currently only accepts two regions.
>
> Have you looked at HMerge ?
> Its condition seems to satisfy your requirement:
>   * When merging a normal table, the HBase instance must be online, but the
>   * table must be disabled.
>
>
> On Sun, Mar 27, 2011 at 5:06 PM, Andy Sautins
> <andy.sautins@returnpath.net>wrote:
>
> >
> >    I have an issue I'm hoping to get some insight into.  We currently
> have
> > a table that has roughly 18k regions.  When we originally created the
> table
> > we didn't realize we should make the regions bigger  and have
> subsequently
> > changed MAX_FILESIZE to something larger.  We are no longer rapidly
> creating
> > new regions, but we still have the large number of regions.  I've been
> > investigating using the merge tool to try to reduce the number of regions
> to
> > something more reasonable for our needs.  The issue I've run into is that
> > the merge tool seems to run somewhat slowly.  On a test table that has a
> > sample of the data in our main table I have roughly 8MM rows each
> > approximately 1k across 48 regions.  Using the merge tool I can reduce
> the
> > number of regions down to 24 by running the merge tool over pairs of
> regions
> > and all seems to work well.  However, for those 48 regions it takes
> roughly
> > 30 minutes.  It's not the end of the world for this table if it takes a
> > while, but given the fact that the cluster needs to be offline when using
> > the merge tool merging has a larger impact that I'd like it to have.
> >
> >    I guess the question I have is if I have a lot more regions than I
> want
> > is there a way to merge the regions down to a smaller number in a
> reasonably
> > efficient manner.  Can I run the merge tool on multiple regions at the
> same
> > time?  Are there alternatives to the merge tool?  Could I export/import
> the
> > data or some other method?
> >
> >   We are currently running 0.90.1.
> >
> >   Any insights would be much appreciated.
> >
> >   Andy
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message