Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B928CD27A for ; Tue, 17 Jul 2012 15:05:14 +0000 (UTC) Received: (qmail 9525 invoked by uid 500); 17 Jul 2012 15:05:13 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 9280 invoked by uid 500); 17 Jul 2012 15:05:12 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 9263 invoked by uid 99); 17 Jul 2012 15:05:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Jul 2012 15:05:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 74.125.82.169 as permitted sender) Received: from [74.125.82.169] (HELO mail-we0-f169.google.com) (74.125.82.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Jul 2012 15:05:07 +0000 Received: by weys10 with SMTP id s10so466320wey.14 for ; Tue, 17 Jul 2012 08:04:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=rymdom93n/6XLqxTRKFUt0B9LGz+c5pYpxdAkxWorhE=; b=FEA1Hw99ezJQV9E0R3rHiUxMCsA2XLZ0pC3FPBZGqjIF7RpnsOAb/IleQIdulp0EOE HCgkDIkfahQFhABF89lI25ZtvvQChQkzq+0EJHOYRHCqPxY2s8WOzWJw3ZYo2a0ECn2c CPGKpO5dGhJa9wXxASmeVoZPEZikxbfMvCzaK8uGyBDdhU4XRkwuxT53Zppt5DM7FBCh 71axCflNOB5F4s5fPqthk3AwG7HRKDRp+aybxVnFWNwoYAtCP/YJN9ItWtc5qXdYgGpO qee1FM92HDQbfFdDk62JrzTyw1bFFdN7wYefj+k2DJSSXCg8rXfcjUkfFrwmFk3TvW/D U5kA== MIME-Version: 1.0 Received: by 10.180.109.129 with SMTP id hs1mr5066212wib.0.1342537485165; Tue, 17 Jul 2012 08:04:45 -0700 (PDT) Received: by 10.216.90.85 with HTTP; Tue, 17 Jul 2012 08:04:45 -0700 (PDT) In-Reply-To: <50057734.08cb440a.4bcd.53f6SMTPIN_ADDED@mx.google.com> References: <50057734.08cb440a.4bcd.53f6SMTPIN_ADDED@mx.google.com> Date: Tue, 17 Jul 2012 08:04:45 -0700 Message-ID: Subject: Re: Load balancer repeatedly close and open region in the same regionserver. From: Ted Yu To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=e89a8f13ea88e6a53c04c507dd62 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f13ea88e6a53c04c507dd62 Content-Type: text/plain; charset=ISO-8859-1 Howard: Before filing JIRA, can you verify with 0.94.1 RC that Lars sent out yesterday ? I guess you have noticed the following toward the end of log snippet: 2012-07-16 00:17:50,774 DEBUG org.apache.hadoop.hbase. master.handler.OpenedRegionHandler: Handling OPENED event for trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. from 192.168.1.2,60020,1342017399608; deleting unassigned node As Ram pointed out, there might be two region server processes running on 192.168.1.2 Please verify whether that was the case. Cheers On Tue, Jul 17, 2012 at 7:30 AM, Ramkrishna.S.Vasudevan < ramkrishna.vasudevan@huawei.com> wrote: > From the logs I can see that though the server's are same their start code > is different. > Need to analyse the previous logs also. Pls file a JIRA, if possible > attach > the logs to that. > > Thanks Howard. > > Regards > Ram > > > -----Original Message----- > > From: Howard [mailto:rj03hou@gmail.com] > > Sent: Tuesday, July 17, 2012 7:32 PM > > To: user@hbase.apache.org > > Subject: Re: Load balancer repeatedly close and open region in the same > > regionserver. > > > > *hi,Ted Yu,thanks for your reply:* > > the hbase and hadoop version is > > HBase Version0.94.0, r1332822Hadoop Version0.20.2-cdh3u1, > > rbdafb1dbffd0d5f2fbc6ee022e1c8df6500fd638 > > the following is a detail log about the same region > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956, > > and it repeats again and again.: > > 2012-07-16 00:12:49,843 INFO org.apache.hadoop.hbase.master.HMaster: > > balance > > hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 > > e1b7e956., > > src=192.168.1.2,60020,1342017399608, > > dest=192.168.1.2,60020,1342002082592 > > 2012-07-16 00:12:49,843 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment > > of > > region > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > (offlining) > > 2012-07-16 00:12:49,843 DEBUG > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > master:60000-0x4384d0a47f40068 Creating unassigned node for > > 93caf5147d40f5dd4625e160e1b7e956 in a CLOSING state > > 2012-07-16 00:12:49,845 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to > > 192.168.1.2,60020,1342017399608 for region > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > 2012-07-16 00:12:50,555 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > transition=RS_ZK_REGION_CLOSED, server=192.168.1.2,60020,1342017399608, > > region=93caf5147d40f5dd4625e160e1b7e956 > > 2012-07-16 00:12:50,555 DEBUG > > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling > > CLOSED > > event for 93caf5147d40f5dd4625e160e1b7e956 > > 2012-07-16 00:12:50,555 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > > was=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 > > e1b7e956. > > state=CLOSED, ts=1342368770556, server=192.168.1.2,60020,1342017399608 > > 2012-07-16 00:12:50,555 DEBUG > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > master:60000-0x4384d0a47f40068 Creating (or updating) unassigned node > > for > > 93caf5147d40f5dd4625e160e1b7e956 with OFFLINE state > > 2012-07-16 00:12:50,558 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > transition=M_ZK_REGION_OFFLINE, server=10.75.18.34,60000,1342017369575, > > region=93caf5147d40f5dd4625e160e1b7e956 > > 2012-07-16 00:12:50,558 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Found an existing > > plan > > for > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > destination server is 192.168.1.2,60020,1342002082592 > > 2012-07-16 00:12:50,558 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing > > plan > > for region > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956.; > > plan=hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd462 > > 5e160e1b7e956., > > src=192.168.1.2,60020,1342017399608, > > dest=192.168.1.2,60020,1342002082592 > > 2012-07-16 00:12:50,558 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > to 192.168.1.2,60020,1342002082592 > > 2012-07-16 00:12:50,574 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > transition=RS_ZK_REGION_OPENING, > > server=192.168.1.2,60020,1342017399608, > > region=93caf5147d40f5dd4625e160e1b7e956 > > 2012-07-16 00:12:50,635 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > transition=RS_ZK_REGION_OPENING, > > server=192.168.1.2,60020,1342017399608, > > region=93caf5147d40f5dd4625e160e1b7e956 > > 2012-07-16 00:12:50,639 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > transition=RS_ZK_REGION_OPENED, server=192.168.1.2,60020,1342017399608, > > region=93caf5147d40f5dd4625e160e1b7e956 > > 2012-07-16 00:12:50,639 DEBUG > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling > > OPENED > > event for > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > from 192.168.1.2,60020,1342017399608; deleting unassigned node > > 2012-07-16 00:12:50,640 DEBUG > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > master:60000-0x4384d0a47f40068 Deleting existing unassigned node for > > 93caf5147d40f5dd4625e160e1b7e956 that is in expected state > > RS_ZK_REGION_OPENED > > 2012-07-16 00:12:50,641 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: The znode of region > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > has been deleted. > > 2012-07-16 00:12:50,641 DEBUG > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > master:60000-0x4384d0a47f40068 Successfully deleted unassigned node for > > region 93caf5147d40f5dd4625e160e1b7e956 in expected state > > RS_ZK_REGION_OPENED > > 2012-07-16 00:12:50,641 INFO > > org.apache.hadoop.hbase.master.AssignmentManager: The master has opened > > the > > region > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > that was online on 192.168.1.2,60020,1342017399608 > > 2012-07-16 00:17:49,870 INFO org.apache.hadoop.hbase.master.HMaster: > > balance > > hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 > > e1b7e956., > > src=192.168.1.2,60020,1342017399608, > > dest=192.168.1.2,60020,1342002082592 > > 2012-07-16 00:17:49,870 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment > > of > > region > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > (offlining) > > 2012-07-16 00:17:49,870 DEBUG > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > master:60000-0x4384d0a47f40068 Creating unassigned node for > > 93caf5147d40f5dd4625e160e1b7e956 in a CLOSING state > > 2012-07-16 00:17:49,872 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to > > 192.168.1.2,60020,1342017399608 for region > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > 2012-07-16 00:17:50,464 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > transition=RS_ZK_REGION_CLOSED, server=192.168.1.2,60020,1342017399608, > > region=93caf5147d40f5dd4625e160e1b7e956 > > 2012-07-16 00:17:50,464 DEBUG > > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling > > CLOSED > > event for 93caf5147d40f5dd4625e160e1b7e956 > > 2012-07-16 00:17:50,464 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > > was=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 > > e1b7e956. > > state=CLOSED, ts=1342369070465, server=192.168.1.2,60020,1342017399608 > > 2012-07-16 00:17:50,464 DEBUG > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > master:60000-0x4384d0a47f40068 Creating (or updating) unassigned node > > for > > 93caf5147d40f5dd4625e160e1b7e956 with OFFLINE state > > 2012-07-16 00:17:50,467 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > transition=M_ZK_REGION_OFFLINE, server=10.75.18.34,60000,1342017369575, > > region=93caf5147d40f5dd4625e160e1b7e956 > > 2012-07-16 00:17:50,467 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Found an existing > > plan > > for > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > destination server is 192.168.1.2,60020,1342002082592 > > 2012-07-16 00:17:50,467 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing > > plan > > for region > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956.; > > plan=hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd462 > > 5e160e1b7e956., > > src=192.168.1.2,60020,1342017399608, > > dest=192.168.1.2,60020,1342002082592 > > 2012-07-16 00:17:50,467 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > to 192.168.1.2,60020,1342002082592 > > 2012-07-16 00:17:50,509 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > transition=RS_ZK_REGION_OPENING, > > server=192.168.1.2,60020,1342017399608, > > region=93caf5147d40f5dd4625e160e1b7e956 > > 2012-07-16 00:17:50,761 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > transition=RS_ZK_REGION_OPENING, > > server=192.168.1.2,60020,1342017399608, > > region=93caf5147d40f5dd4625e160e1b7e956 > > 2012-07-16 00:17:50,774 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > transition=RS_ZK_REGION_OPENED, server=192.168.1.2,60020,1342017399608, > > region=93caf5147d40f5dd4625e160e1b7e956 > > 2012-07-16 00:17:50,774 DEBUG > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling > > OPENED > > event for > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > from 192.168.1.2,60020,1342017399608; deleting unassigned node > > 2012-07-16 00:17:50,774 DEBUG > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > master:60000-0x4384d0a47f40068 Deleting existing unassigned node for > > 93caf5147d40f5dd4625e160e1b7e956 that is in expected state > > RS_ZK_REGION_OPENED > > 2012-07-16 00:17:50,775 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: The znode of region > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > has been deleted. > > 2012-07-16 00:17:50,775 DEBUG > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > master:60000-0x4384d0a47f40068 Successfully deleted unassigned node for > > region 93caf5147d40f5dd4625e160e1b7e956 in expected state > > RS_ZK_REGION_OPENED > > 2012-07-16 00:17:50,775 INFO > > org.apache.hadoop.hbase.master.AssignmentManager: The master has opened > > the > > region > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > that was online on 192.168.1.2,60020,1342017399608 > > 2012-07-16 00:22:49,916 INFO org.apache.hadoop.hbase.master.HMaster: > > balance > > hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 > > e1b7e956., > > src=192.168.1.2,60020,1342017399608, > > dest=192.168.1.2,60020,1342002082592 > > .... > > > > 2012/7/17 Ted Yu > > > > > Which HBase version were you using ? > > > > > > The log messages below were in very short time period. Can you > > provide more > > > logs ? > > > > > > Thanks > > > > > > On Tue, Jul 17, 2012 at 5:09 AM, Howard wrote: > > > > > > > Through the master and regionserver log,I find load balancer > > repeatedly > > > > close and open region in the same regionserver(period in > > > > hbase.balancer.period ). > > > > Does this is a bug in load balancer and how can I dig into or avoid > > this? > > > > The log like this: > > > > 2012-07-16 00:12:50,055 INFO > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > request to > > > > open region: > > > > > > > > > > > > > trackurl_status_list,zOwMwdi,1342294447440.7608037379153932a30f4bdd6d7e > > df5d. > > > > 2012-07-16 00:12:50,069 INFO > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > request to > > > > open region: > > > > > > > > > > > > > trackurl_comment_list,zO7vqv1,1342192347676.0c9036e183055ff2df93eca348a > > eeada. > > > > 2012-07-16 00:12:50,206 INFO > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > request to > > > > open region: > > > > > > > > > > > > > trackurl_status_list,zOzYDL9,1342294447440.47eb162f6aecffcddfa897fdbb70 > > 472b. > > > > 2012-07-16 00:12:50,301 INFO > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > request to > > > > open region: > > > > > > > > > > > > > trackurl_status_list,zOM1HZP,1342293292587.9af5de3db0bad0470d9b8fe6b9ea > > df14. > > > > 2012-07-16 00:12:50,428 INFO > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > request to > > > > open region: > > > > > > > > > > > > > trackurl_status_list,zOPKLDi,1342293292587.f89dbfc1dad366c991eba3c24822 > > 8637. > > > > 2012-07-16 00:12:50,562 INFO > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > request to > > > > open region: > > > > > > > > > > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > > > 2012-07-16 00:12:50,685 INFO > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > request to > > > > open region: > > > > > > > > > > > > > trackurl_status_list,zOIR8Tw,1342283152067.6284e9f69042266cde88a7b19071 > > d701. > > > > 2012-07-16 00:12:50,791 INFO > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > request to > > > > open region: > > > > > > > > > > > > > trackurl_comment_list,zOqzaNw,1342195266309.bd8e19c1063f4f2984f9ca63365 > > c5573. > > > > > > > > > --e89a8f13ea88e6a53c04c507dd62--