Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Fri, 13 Jul 2012 00:40:35 +0000 (UTC)
From: "Lars Hofhansl (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <1247040617.45357.1342140035052.JavaMail.jiratomcat@issues-vm>
In-Reply-To: <292939911.38893.1342048654671.JavaMail.jiratomcat@issues-vm>
Subject: [jira] [Commented] (HBASE-6375) Master may be using a stale list of
 region servers for creating assignment plan during startup
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HBASE-6375?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1341=
3375#comment-13413375 ]=20

Lars Hofhansl commented on HBASE-6375:
--------------------------------------

Committed to 0.94 as well.
               =20
> Master may be using a stale list of region servers for creating assignmen=
t plan during startup
> -------------------------------------------------------------------------=
---------------------
>
>                 Key: HBASE-6375
>                 URL: https://issues.apache.org/jira/browse/HBASE-6375
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.92.1, 0.94.0, 0.96.0
>         Environment: All
>            Reporter: Aditya Kishore
>            Assignee: Aditya Kishore
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: HBASE-6375_94.patch, HBASE-6375_trunk.patch
>
>
> While investigating an Out of Memory issue, I had an interesting observat=
ion where the master tries to assign all regions to a single region server =
even though 7 other had already registered with it.
> As the cluster had MSLAB enabled, this resulted in OOM on the RS when it =
tired to open all of them.
> *From master's log (edited for brevity):*
> {quote}
> 55,468=C2=A0Waiting=C2=A0on=C2=A0regionserver(s)=C2=A0to=C2=A0checkin
> 56,968=C2=A0Waiting=C2=A0on=C2=A0regionserver(s)=C2=A0to=C2=A0checkin
> 58,468=C2=A0Waiting=C2=A0on=C2=A0regionserver(s)=C2=A0to=C2=A0checkin
> 59,968=C2=A0Waiting=C2=A0on=C2=A0regionserver(s)=C2=A0to=C2=A0checkin
> 01,242=C2=A0Registering=C2=A0server=3Dsrv109.datacenter,60020,13386739205=
29,regionCount=3D0,userLoad=3Dfalse
> 01,469=C2=A0Waiting=C2=A0on=C2=A0regionserver(s)=C2=A0count=C2=A0to=C2=A0=
settle;=C2=A0currently=3D1
> 02,969=C2=A0Finished=C2=A0waiting=C2=A0for=C2=A0regionserver=C2=A0count=
=C2=A0to=C2=A0settle;=C2=A0count=3D1,sleptFor=3D46500
> 02,969=C2=A0Exiting=C2=A0wait=C2=A0on=C2=A0regionserver(s)=C2=A0to=C2=A0c=
heckin;=C2=A0count=3D1,=C2=A0stopped=3Dfalse,count=C2=A0of=C2=A0regions=C2=
=A0out=C2=A0on=C2=A0cluster=3D0
> 03,010=C2=A0Processing=C2=A0region=C2=A0\-ROOT\-,,0.70236052=C2=A0in=C2=
=A0state=C2=A0M_ZK_REGION_OFFLINE
> 03,220=C2=A0\-ROOT\-=C2=A0assigned=3D0,=C2=A0rit=3Dtrue,=C2=A0location=3D=
srv109.datacenter:60020
> 03,221=C2=A0Processing=C2=A0region=C2=A0.META.,,1.1028785192=C2=A0in=C2=
=A0state=C2=A0M_ZK_REGION_OFFLINE
> 03,336=C2=A0Detected=C2=A0completed=C2=A0assignment=C2=A0of=C2=A0META,=C2=
=A0notifying=C2=A0catalog=C2=A0tracker
> 03,350=C2=A0.META.=C2=A0assigned=3D0,=C2=A0rit=3Dtrue,=C2=A0location=3Dsr=
v109.datacenter:60020
> 03,350=C2=A0Master=C2=A0startup=C2=A0proceeding:=C2=A0cluster=C2=A0startu=
p
> 04,006=C2=A0Registering=C2=A0server=3Dsrv111.datacenter,60020,13386739233=
99,regionCount=3D0,userLoad=3Dfalse
> 04,012=C2=A0Registering=C2=A0server=3Dsrv113.datacenter,60020,13386739235=
32,regionCount=3D0,userLoad=3Dfalse
> 04,269=C2=A0Registering=C2=A0server=3Dsrv115.datacenter,60020,13386739234=
71,regionCount=3D0,userLoad=3Dfalse
> 04,363=C2=A0Registering=C2=A0server=3Dsrv117.datacenter,60020,13386739239=
28,regionCount=3D0,userLoad=3Dfalse
> 04,599=C2=A0Registering=C2=A0server=3Dsrv127.datacenter,60020,13386739240=
67,regionCount=3D0,userLoad=3Dfalse
> 04,606=C2=A0Registering=C2=A0server=3Dsrv119.datacenter,60020,13386739239=
53,regionCount=3D0,userLoad=3Dfalse
> 04,804=C2=A0Registering=C2=A0server=3Dsrv129.datacenter,60020,13386739243=
39,regionCount=3D0,userLoad=3Dfalse
> 05,126=C2=A0Bulk=C2=A0assigning=C2=A01252=C2=A0region(s)=C2=A0across=C2=
=A01=C2=A0server(s),=C2=A0retainAssignment=3Dtrue
> 05,546=C2=A0hd109.datacenter,60020,1338673920529=C2=A0unassigned=C2=A0zno=
des=3D207=C2=A0of
> {quote}
> *A peek at AssignmentManager code offer some explanation:*
> {code}
>   public void assignAllUserRegions() throws IOException, InterruptedExcep=
tion {
>     // Get all available servers
>     List<HServerInfo> servers =3D serverManager.getOnlineServersList();
>     // Scan META for all user regions, skipping any disabled tables
>     Map<HRegionInfo,HServerAddress> allRegions =3D
>       MetaReader.fullScan(catalogTracker, this.zkTable.getDisabledTables(=
), true);
>     if (allRegions =3D=3D null || allRegions.isEmpty()) return;
>     // Determine what type of assignment to do on startup
>     boolean retainAssignment =3D master.getConfiguration().
>       getBoolean("hbase.master.startup.retainassign", true);
>     Map<HServerInfo, List<HRegionInfo>> bulkPlan =3D null;
>     if (retainAssignment) {
>       // Reuse existing assignment info
>       bulkPlan =3D LoadBalancer.retainAssignment(allRegions, servers);
>     } else {
>       // assign regions in round-robin fashion
>       bulkPlan =3D LoadBalancer.roundRobinAssignment(new ArrayList<HRegio=
nInfo>(allRegions.keySet()), servers);
>     }
>     LOG.info("Bulk assigning " + allRegions.size() + " region(s) across "=
 +
>       servers.size() + " server(s), retainAssignment=3D" + retainAssignme=
nt);
>     ...
> {code}
> In the function assignAllUserRegions(), listed above, AM fetches the serv=
er list from ServerManager long before it actually use it to create assignm=
ent plan.
> In between these, it performs a full scan of META to create an assignment=
 map of regions. So even if additional RSes have registered in the meantime=
 (as happened in this case), AM still has the old list of just one server.
> This code snippet is from 0.90.6 but the same issue exists in 0.92, 0.94 =
and trunk. Since MSLAB is enabled by default in 0.92 onwards, any large clu=
ster can hit this issue upon cluster start-up when the following sequence h=
olds true.
> # Master start long before the RSes (by default this long ~=3D 4.5 second=
s)
> # All the RSes start togather but one wins the race of registering with M=
aster by few seconds.
> I am attaching a patch for the trunk which moves the code which fetches t=
he RS list form the beginning of the function to where it is first use.
> Apart from this change, one other HBase setting that now becomes importan=
t is "hbase.master.wait.on.regionservers.mintostart" due to MSLAB being ena=
bled by default.
> In large clusters which keeps it enabled now must modify "hbase.master.wa=
it.on.regionservers.mintostart" to a suitable number than the default of 1 =
to ensure that the master waits for a quorum of RSes which are sufficient t=
o open all the regions among themselves. I'll create a separate JIRA for th=
e documentation change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrato=
rs: https://issues.apache.org/jira/secure/ContactAdministrators!default.jsp=
a
For more information on JIRA, see: http://www.atlassian.com/software/jira