hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Need help trying to balance HBase RegionServer load
Date Thu, 17 Jun 2010 16:15:04 GMT
It's similarly stone age, but you can use the "close_region" command from
the shell to get those regions to close. They may reopen on a different
server (though it's possibly they'll just reopen in the same place!)

Jonathan and the FB folks are working on improving load balancing in trunk,
hopefully it should be better in the next release.

-Todd

On Thu, Jun 17, 2010 at 9:05 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

> On Thu, Jun 17, 2010 at 11:58 AM, Daniel Einspanjer <
> deinspanjer@mozilla.com
> > wrote:
>
> >  Here is an example of a region split with both daughters being assigned
> to
> > the same region.  Is this expected?
> >
> > 2010-06-17 08:34:53,060 INFO
> org.apache.hadoop.hbase.master.ServerManager:
> > Processing MSG_REPORT_SPLIT_INCLUDES_DAUGHTERS:
> > crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276776160508:
> > Daughters;
> > crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647,
> > crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647
> from
> > cm-hadoop14.mozilla.org,60020,1276560962019; 1 of 1
> > 2010-06-17 08:34:54,316 INFO
> org.apache.hadoop.hbase.master.RegionManager:
> > Assigning region
> > crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647
> to
> > cm-hadoop15.mozilla.org,60020,1276778868841
> > 2010-06-17 08:34:54,316 INFO
> org.apache.hadoop.hbase.master.RegionManager:
> > Assigning region
> > crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647
> to
> > cm-hadoop15.mozilla.org,60020,12767788688412010-06-17 08:34:55,432 INFO
> > org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_OPEN:
> > crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647
> from
> > cm-hadoop15.mozilla.org,60020,1276778868841;
> > 1 of 1
> > 2010-06-17 08:34:55,432 INFO
> > org.apache.hadoop.hbase.master.RegionServerOperation:
> > crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647
> open
> > on 10.2.72.74:60020
> > 2010-06-17 08:34:55,436 INFO
> > org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
> > crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647
> in
> > region .META.,,1 with startcode=1276778868841, server=1
> > 0.2.72.74:60020
> > 2010-06-17 08:34:56,044 INFO
> org.apache.hadoop.hbase.master.ServerManager:
> > Processing MSG_REPORT_OPEN:
> > crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647
> from
> > cm-hadoop15.mozilla.org,60020,1276778868841;
> > 1 of 1
> > 2010-06-17 08:34:56,044 INFO
> > org.apache.hadoop.hbase.master.RegionServerOperation:
> > crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647
> open
> > on 10.2.72.74:60020
> > 2010-06-17 08:34:56,048 INFO
> > org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
> > crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647
> in
> > region .META.,,1 with startcode=1276778868841, server=1
> > 0.2.72.74:60020
> >
> >
> >
> > On 6/17/10 11:42 AM, Daniel Einspanjer wrote:
> >
> >>  Currently, in our production cluster, almost all of the traffic for a
> day
> >> ends up assigned to a single RS and that causes the load on that machine
> to
> >> be too high.
> >>
> >> With our last release, we salted our rowkeys so that rather than
> starting
> >> with the date:
> >> 100617<guid>
> >> they now start with the first letter of the guid followed by the date:
> >> e100617<guid_that_starts_with_e>
> >>
> >> When I look at the region assignments though, I see a single server
> >> assigned the following regions:
> >> 0100617...
> >> 1100617...
> >> 2100617...
> >> 3100617...
> >> 4100617...
> >> ...
> >> d100617...
> >> e100617...
> >> f100617...
> >>
> >> Is there anything we can do to try to get the cluster to shuffle this up
> >> some more?
> >> We are getting compaction times in the minutes (one I saw was over 12
> >> minutes) and this causes our clients to time out and shut down which
> causes
> >> production outages.
> >>
> >> -Daniel
> >>
> >
> Here comes a stone age, stop gap suggestion. If you shutdown the region
> server you would get them to move, but there is a period of time where the
> region is inaccessible so that is never good.
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message