Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D0378DC5D for ; Thu, 12 Jul 2012 02:42:35 +0000 (UTC) Received: (qmail 12083 invoked by uid 500); 12 Jul 2012 02:42:35 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 12046 invoked by uid 500); 12 Jul 2012 02:42:35 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 12017 invoked by uid 99); 12 Jul 2012 02:42:35 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Jul 2012 02:42:35 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id AAAA5141BF8 for ; Thu, 12 Jul 2012 02:42:34 +0000 (UTC) Date: Thu, 12 Jul 2012 02:42:34 +0000 (UTC) From: "Hadoop QA (JIRA)" To: issues@hbase.apache.org Message-ID: <1188792808.39834.1342060954701.JavaMail.jiratomcat@issues-vm> In-Reply-To: <292939911.38893.1342048654671.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (HBASE-6375) Master may be using a stale list of region servers for creating assignment plan during startup MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-6375?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1341= 2469#comment-13412469 ]=20 Hadoop QA commented on HBASE-6375: ---------------------------------- -1 overall. Here are the results of testing the latest attachment=20 http://issues.apache.org/jira/secure/attachment/12536140/HBASE-6375_trunk= .patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modi= fied tests. Please justify why no new tests are needed for this= patch. Also please list what manual steps were performed t= o verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more = than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.= 3.9) warnings. +1 release audit. The applied patch does not increase the total number= of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestHMasterRPCExcepti= on Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2364//tes= tReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2364= //artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2364= //artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2364//c= onsole This message is automatically generated. =20 > Master may be using a stale list of region servers for creating assignmen= t plan during startup > -------------------------------------------------------------------------= --------------------- > > Key: HBASE-6375 > URL: https://issues.apache.org/jira/browse/HBASE-6375 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.90.6, 0.92.1, 0.94.0, 0.96.0 > Environment: All > Reporter: Aditya Kishore > Assignee: Aditya Kishore > Fix For: 0.96.0 > > Attachments: HBASE-6375_trunk.patch > > > While investigating an Out of Memory issue, I had an interesting observat= ion where the master tries to assign all regions to a single region server = even though 7 other had already registered with it. > As the cluster had MSLAB enabled, this resulted in OOM on the RS when it = tired to open all of them. > *From master's log (edited for brevity):* > {quote} > 55,468=C2=A0Waiting=C2=A0on=C2=A0regionserver(s)=C2=A0to=C2=A0checkin > 56,968=C2=A0Waiting=C2=A0on=C2=A0regionserver(s)=C2=A0to=C2=A0checkin > 58,468=C2=A0Waiting=C2=A0on=C2=A0regionserver(s)=C2=A0to=C2=A0checkin > 59,968=C2=A0Waiting=C2=A0on=C2=A0regionserver(s)=C2=A0to=C2=A0checkin > 01,242=C2=A0Registering=C2=A0server=3Dsrv109.datacenter,60020,13386739205= 29,regionCount=3D0,userLoad=3Dfalse > 01,469=C2=A0Waiting=C2=A0on=C2=A0regionserver(s)=C2=A0count=C2=A0to=C2=A0= settle;=C2=A0currently=3D1 > 02,969=C2=A0Finished=C2=A0waiting=C2=A0for=C2=A0regionserver=C2=A0count= =C2=A0to=C2=A0settle;=C2=A0count=3D1,sleptFor=3D46500 > 02,969=C2=A0Exiting=C2=A0wait=C2=A0on=C2=A0regionserver(s)=C2=A0to=C2=A0c= heckin;=C2=A0count=3D1,=C2=A0stopped=3Dfalse,count=C2=A0of=C2=A0regions=C2= =A0out=C2=A0on=C2=A0cluster=3D0 > 03,010=C2=A0Processing=C2=A0region=C2=A0\-ROOT\-,,0.70236052=C2=A0in=C2= =A0state=C2=A0M_ZK_REGION_OFFLINE > 03,220=C2=A0\-ROOT\-=C2=A0assigned=3D0,=C2=A0rit=3Dtrue,=C2=A0location=3D= srv109.datacenter:60020 > 03,221=C2=A0Processing=C2=A0region=C2=A0.META.,,1.1028785192=C2=A0in=C2= =A0state=C2=A0M_ZK_REGION_OFFLINE > 03,336=C2=A0Detected=C2=A0completed=C2=A0assignment=C2=A0of=C2=A0META,=C2= =A0notifying=C2=A0catalog=C2=A0tracker > 03,350=C2=A0.META.=C2=A0assigned=3D0,=C2=A0rit=3Dtrue,=C2=A0location=3Dsr= v109.datacenter:60020 > 03,350=C2=A0Master=C2=A0startup=C2=A0proceeding:=C2=A0cluster=C2=A0startu= p > 04,006=C2=A0Registering=C2=A0server=3Dsrv111.datacenter,60020,13386739233= 99,regionCount=3D0,userLoad=3Dfalse > 04,012=C2=A0Registering=C2=A0server=3Dsrv113.datacenter,60020,13386739235= 32,regionCount=3D0,userLoad=3Dfalse > 04,269=C2=A0Registering=C2=A0server=3Dsrv115.datacenter,60020,13386739234= 71,regionCount=3D0,userLoad=3Dfalse > 04,363=C2=A0Registering=C2=A0server=3Dsrv117.datacenter,60020,13386739239= 28,regionCount=3D0,userLoad=3Dfalse > 04,599=C2=A0Registering=C2=A0server=3Dsrv127.datacenter,60020,13386739240= 67,regionCount=3D0,userLoad=3Dfalse > 04,606=C2=A0Registering=C2=A0server=3Dsrv119.datacenter,60020,13386739239= 53,regionCount=3D0,userLoad=3Dfalse > 04,804=C2=A0Registering=C2=A0server=3Dsrv129.datacenter,60020,13386739243= 39,regionCount=3D0,userLoad=3Dfalse > 05,126=C2=A0Bulk=C2=A0assigning=C2=A01252=C2=A0region(s)=C2=A0across=C2= =A01=C2=A0server(s),=C2=A0retainAssignment=3Dtrue > 05,546=C2=A0hd109.datacenter,60020,1338673920529=C2=A0unassigned=C2=A0zno= des=3D207=C2=A0of > {quote} > *A peek at AssignmentManager code offer some explanation:* > {code} > public void assignAllUserRegions() throws IOException, InterruptedExcep= tion { > // Get all available servers > List servers =3D serverManager.getOnlineServersList(); > // Scan META for all user regions, skipping any disabled tables > Map allRegions =3D > MetaReader.fullScan(catalogTracker, this.zkTable.getDisabledTables(= ), true); > if (allRegions =3D=3D null || allRegions.isEmpty()) return; > // Determine what type of assignment to do on startup > boolean retainAssignment =3D master.getConfiguration(). > getBoolean("hbase.master.startup.retainassign", true); > Map> bulkPlan =3D null; > if (retainAssignment) { > // Reuse existing assignment info > bulkPlan =3D LoadBalancer.retainAssignment(allRegions, servers); > } else { > // assign regions in round-robin fashion > bulkPlan =3D LoadBalancer.roundRobinAssignment(new ArrayList(allRegions.keySet()), servers); > } > LOG.info("Bulk assigning " + allRegions.size() + " region(s) across "= + > servers.size() + " server(s), retainAssignment=3D" + retainAssignme= nt); > ... > {code} > In the function assignAllUserRegions(), listed above, AM fetches the serv= er list from ServerManager long before it actually use it to create assignm= ent plan. > In between these, it performs a full scan of META to create an assignment= map of regions. So even if additional RSes have registered in the meantime= (as happened in this case), AM still has the old list of just one server. > This code snippet is from 0.90.6 but the same issue exists in 0.92, 0.94 = and trunk. Since MSLAB is enabled by default in 0.92 onwards, any large clu= ster can hit this issue upon cluster start-up when the following sequence h= olds true. > # Master start long before the RSes (by default this long ~=3D 4.5 second= s) > # All the RSes start togather but one wins the race of registering with M= aster by few seconds. > I am attaching a patch for the trunk which moves the code which fetches t= he RS list form the beginning of the function to where it is first use. > Apart from this change, one other HBase setting that now becomes importan= t is "hbase.master.wait.on.regionservers.mintostart" due to MSLAB being ena= bled by default. > In large clusters which keeps it enabled now must modify "hbase.master.wa= it.on.regionservers.mintostart" to a suitable number than the default of 1 = to ensure that the master waits for a quorum of RSes which are sufficient t= o open all the regions among themselves. I'll create a separate JIRA for th= e documentation change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs: https://issues.apache.org/jira/secure/ContactAdministrators!default.jsp= a For more information on JIRA, see: http://www.atlassian.com/software/jira