Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of nanhengwu@gmail.com designates
 209.85.210.41 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=GULI+BzoZWpQHPKzw71Iyg0ox0Sv2JgUiefai0wUBV8om8ZGOnTNcIX1dvQhgdjY6b
         5jiGKo/K/Nv59C2qTL06hAnrukcUSjZKJMgQYuuS5Gfg2P5xpypIxTzXj/k9Qv8mJmJu
         /fOw1NetM7o+hAf0fAlbg1TRjpkQ6MWI5rnJA=
MIME-Version: 1.0
In-Reply-To: <AANLkTi=fTm9-TAGEHGCpiPPWUzqe3+YVrsoyu=mxAKdr@mail.gmail.com>
References: <AANLkTimHxjHEzdcXGU=5UGsdT5LLrqyUJn+tNBTDQOPK@mail.gmail.com>
	<AANLkTikx9cyvutp3QCJ8SRFRHH2DR-osWOb3VSAgqORL@mail.gmail.com>
	<AANLkTikDk1gPs7AUeoFrQoDusDV_euPqov_muQfd=xX6@mail.gmail.com>
	<AANLkTi=AxnUgzE+SJCQ6xYaUH17FLcBNkfxg4G0GP1FG@mail.gmail.com>
	<AANLkTi=DNuNsB=2tJW-wTEjTCF=52tDRFKJhMgZSC8A_@mail.gmail.com>
	<AANLkTi=bzHAtFo4vTPuW3JWvSG13swCthLRVAnB6gPhx@mail.gmail.com>
	<AANLkTimSeRW9F-8B9EGU_gNY3XZwXHWzPHejm7ufqo9x@mail.gmail.com>
	<AANLkTimOJPZrUfhyK=gfH3538111dGoFGamYZ3tdmcqA@mail.gmail.com>
	<AANLkTikmG0Z7Zewe9Zjn5ZCMnHZNCkO2=8SydLTcwuVi@mail.gmail.com>
	<AANLkTi=MCTnOvuSMkQ7tMCiO9649O6WVTJq2w4=MJhOh@mail.gmail.com>
	<AANLkTinc5Gnc1P5yqLpo6AdbJhazir_XMhocq_aMtj+d@mail.gmail.com>
	<AANLkTi=d9MT8fAeNpMPgOq-efYeyzfLyLQk1SsRPDnS0@mail.gmail.com>
	<AANLkTinvagRAws7r+BpAJY1dmDs8xb9Pq37VpTnFPBo_@mail.gmail.com>
	<AANLkTi=Ydv90h6LpgnnfC7a+rZMXy8ki3AKDc3o5=b0X@mail.gmail.com>
	<AANLkTiknTHOwRdeDxDVcz4Qnt0n4XJXtd-Ek0iH3Q2+=@mail.gmail.com>
	<AANLkTin7BO5ZH6U8H3Bs+PW-qha--UY_5XiUEimhv=ZY@mail.gmail.com>
	<AANLkTimiD2GG30faz++XqhgOi52PidfPAr5zTSqDm9Qx@mail.gmail.com>
	<AANLkTik2qePc44GbJ_idGwPdcRNWdho1W5NLnbvRA2Yp@mail.gmail.com>
	<AANLkTina2mTa9y3SedBb5_X2DWL++C1JJmoL_zXvbtY3@mail.gmail.com>
	<AANLkTim4WQGkUAkMNqoPjRbQZs27Zu1wzk2wecKv2nFq@mail.gmail.com>
	<AANLkTi=fTm9-TAGEHGCpiPPWUzqe3+YVrsoyu=mxAKdr@mail.gmail.com>
Date: Tue, 1 Mar 2011 10:34:45 -0800
Message-ID: <AANLkTin61TGMdA0mS5yMbarbxfFoWyEcrs8EK5MvjmBM@mail.gmail.com>
Subject: Re: Disabling a table taking very long time
From: Nanheng Wu <nanhengwu@gmail.com>
To: user@hbase.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

J-D,

It looks like I don't have that many files:

drwxr-xr-x   - root supergroup          0 2011-02-28 23:21
/hbase/.META./1028785192
-rw-r--r--   3 root supergroup        932 2011-02-03 19:23
/hbase/.META./1028785192/.regioninfo
drwxr-xr-x   - root supergroup          0 2011-03-01 17:13
/hbase/.META./1028785192/info
-rw-r--r--   3 root supergroup   11831330 2011-03-01 17:13
/hbase/.META./1028785192/info/6687606720393313750
drwxr-xr-x   - root supergroup          0 2011-03-01 17:13
/hbase/.META./compaction.dir

So you think compaction on .META. table on the RS could be the
problem? while the log metaScanner is running I saw virtually no logs
on that server though, I did see compactions once the meta scan ended
and region close started to happen. I wonder what the cluster is doing
while the meta scan is going.

I would definitely upgrade in the near future, it's just that I am
afraid upgrading won't guarantee a fix if I don't understand the
nature of the problem more clearly. We also have another test cluster,
running the same versions of everything on cheaper hardware and it
doesn't have this problem.

On Tue, Mar 1, 2011 at 9:52 AM, Jean-Daniel Cryans <jdcryans@apache.org> wr=
ote:
> That long .META. scan might be the culprit, good catch!
>
> If you run this from the hadoop folder:
>
> bin/hadoop dfs -lsr /hbase/.META.
>
> Do you see a ton of files? Let's define "ton" as more than 16 files.
> If so, and I see you have a very high number of regions, then you
> could be hitting this problem:
> https://issues.apache.org/jira/browse/HBASE-3499
>
> In short, every few .META. change will result in a flush and it could
> be that the region server hosting it isn't able to keep up with the
> compactions. It's done so that since append isn't supported in your
> hadoop that you won't lose too much data if the node dies, which is
> really bad when it happens to .META.
>
> In your case it might not be so bad to set your .META.'s MEMSTORE_SIZE
> back to the default value if you're not writing hence not splitting.
> Running the script attached in that jira would do it (don't forget to
> restart HBase after running it).
>
> Lastly, upgrading to HBase 0.90.1 and a hadoop that supports append
> should be a priority.
>
> J-D
>
> On Tue, Mar 1, 2011 at 9:30 AM, Nanheng Wu <nanhengwu@gmail.com> wrote:
>> Hi J-D:
>>
>> =A0I did the scan like you suggested but no splits came up. This kind
>> of makes sense to me, since we write Hfiles using a MR job and then
>> pretty much use HBase for read-only so maybe that's why the regions
>> never got split? I think my slow disable problem is related to this
>> RegionManager.metaScanner. I greped for that in the Masters log and I
>> saw:
>> 11/03/01 15:30:37 INFO master.BaseScanner: RegionManager.metaScanner
>> scanning meta region {server: 10.146.14.38:60020, regionname:
>> .META.,,1, startKey: <>}
>> 11/03/01 15:55:40 INFO master.BaseScanner: RegionManager.metaScanner
>> scan of 8296 row(s) of meta region {server: 10.146.14.38:60020,
>> regionname: .META.,,1, startKey: <>} complete
>> 11/03/01 15:55:40 INFO master.BaseScanner: RegionManager.metaScanner
>> scanning meta region {server: 10.146.14.38:60020, regionname:
>> .META.,,1, startKey: <>}
>> Thread 27 (RegionManager.metaScanner):
>> 11/03/01 16:21:04 INFO master.BaseScanner: RegionManager.metaScanner
>> scan of 8296 row(s) of meta region {server: 10.146.14.38:60020,
>> regionname: .META.,,1, startKey: <>} complete
>> 11/03/01 16:21:04 INFO master.BaseScanner: RegionManager.metaScanner
>> scanning meta region {server: 10.146.14.38:60020, regionname:
>> .META.,,1, startKey: <>}
>> Thread 27 (RegionManager.metaScanner):
>> 11/03/01 16:46:07 INFO master.BaseScanner: RegionManager.metaScanner
>> scan of 8298 row(s) of meta region {server: 10.146.14.38:60020,
>> regionname: .META.,,1, startKey: <>} complete
>>
>> It looks like meta scanner is always running, each time taking 20-30
>> minutes. Here's what I did:
>>
>> 1) disable table -> at 16:28:31, according the log above there's a
>> meta scan in progress.
>>
>> 2 ) the command hangs, no output in .META RS's log
>> at 11/03/01 16:46:57 the client got 11/03/01 16:46:58 DEBUG
>> zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got
>> 10.146.14.38:60020
>> 11/03/01 16:46:58 DEBUG client.HConnectionManager$TableServers: Found
>> ROOT at 10.146.14.38:60020
>> 11/03/01 16:46:58 DEBUG client.HConnectionManager$TableServers: Cached
>> location for .META.,,1 is 10.146.14.38:60020
>> This corresponds exact to when the meta scan completed. .META RS
>> started to spew some logs about closing regions.
>>
>> 3) Another meta scan started and the client hangs again until it
>> finishes, and disable command returns.The whole thing took 2688.5930
>> seconds
>>
>> What kind of problem does it look like to you J-D? What does this
>> background task metaScanner do?
>>
>> Thanks!
>>
>>
>>
>> On Mon, Feb 28, 2011 at 5:17 PM, Jean-Daniel Cryans <jdcryans@apache.org=
> wrote:
>>> I didn't get most of the information I was looking for... but I do
>>> have something to work with. So I asked whether the metaScanner was
>>> making progress or not, and from that log I think I see it does:
>>>
>>>> 11/03/01 00:47:30 DEBUG client.HConnectionManager$TableServers:
>>>> Rowscanned=3D261, rowsOffline=3D3
>>>> 11/03/01 00:47:30 DEBUG client.HBaseAdmin: Sleep. Waiting for all
>>>> regions to be disabled from myTable
>>>
>>> Only 3 rows are offline, whereas we expect all the region to be
>>> offline. I expect you see those lines a lot? If so, one thing I'd like
>>> to see is the result of this command:
>>>
>>> scan '.META.', {STARTROW =3D> "myTable,,", LIMIT =3D> 261}
>>>
>>> It's going to be big. Then grep in the result for the string SPLIT,
>>> and please post back here the lines that match.
>>>
>>> J-D
>>>
>>> On Mon, Feb 28, 2011 at 5:04 PM, Nanheng Wu <nanhengwu@gmail.com> wrote=
:
>>>> I issued two more disable commands and looked at the .META RS. The
>>>> last log entry looks like:
>>>>
>>>> 11/03/01 00:29:14 INFO regionserver.HLog: Roll
>>>> /hbase/.logs/<RS_HOSTNAME>,60020,1298935751670/hlog
>>>> .dat.1298935752660, entries=3D13527, calcsize=3D3839046, filesize=3D25=
60143.
>>>> New hlog /hbase/.logs/<RS_HOSTNAME>,60020,1298935751670/hlog.dat.12989=
39354072
>>>>
>>>> I issued the command at: 00:34:59, the client debug message has
>>>> nothing but "11/03/01 00:34:59 DEBUG zookeeper.ClientCnxn: Got ping
>>>> response for sessionid:0x12e6e938e9c000b after 90ms" until 00:47:19.
>>>> Then all of a sudden it outputs the following:
>>>>
>>>> 0:47:28 DEBUG client.HConnectionManager$TableServers: Cache hit for
>>>> row <> in tableName .META.: location server 10.146.14.38:60020,
>>>> location region name .META.,,1
>>>> 11/03/01 00:47:28 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode
>>>> /hbase/root-region-server got 10.146.14.38:60020
>>>> 11/03/01 00:47:28 DEBUG client.HConnectionManager$TableServers: Found
>>>> ROOT at 10.146.14.38:60020
>>>> 11/03/01 00:47:28 DEBUG client.HConnectionManager$TableServers: Cached
>>>> location for .META.,,1 is 10.146.14.38:60020
>>>> 11/03/01 00:47:30 DEBUG client.HConnectionManager$TableServers:
>>>> Rowscanned=3D261, rowsOffline=3D3
>>>> 11/03/01 00:47:30 DEBUG client.HBaseAdmin: Sleep. Waiting for all
>>>> regions to be disabled from myTable
>>>> 11/03/01 00:47:31 DEBUG client.HBaseAdmin: Wake. Waiting for all
>>>> regions to be disabled from myTable
>>>>
>>>> Here's the full thread dump of the .META RS from 00:34:59 to =A000:47:=
31
>>>> http://pastebin.com/Ywwx8afQ
>>>>
>>>> I hope I compiled some useful information here. I am pretty lost at
>>>> this point, I would really appreciate your help!
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>> On Mon, Feb 28, 2011 at 2:58 PM, Jean-Daniel Cryans <jdcryans@apache.o=
rg> wrote:
>>>>> Ok so no debug from the disable command? BTW you should output your G=
C
>>>>> somewhere else like the .out file or a completely separate file.
>>>>>
>>>>> J-D
>>>>>
>>>>> On Mon, Feb 28, 2011 at 2:54 PM, Nanheng Wu <nanhengwu@gmail.com> wro=
te:
>>>>>> I see, so I should jstack the .META region. I'll do that.
>>>>>>
>>>>>> The master log pretty much looks like this: should I grep for
>>>>>> something specific?
>>>>>>
>>>>>> 11/02/28 22:52:56 INFO master.BaseScanner: RegionManager.rootScanner
>>>>>> scan of 1 row(s) of meta region complete
>>>>>> 2011-02-28T22:52:57.138+0000: 263241.014: [GC 263241.014: [ParNew:
>>>>>> 19136K->2112K(19136K), 0.0033350 secs] 378606K->36159
>>>>>> 6K(641944K) icms_dc=3D0 , 0.0034330 secs] [Times: user=3D0.02 sys=3D=
0.00,
>>>>>> real=3D0.00 secs]
>>>>>> 2011-02-28T22:53:04.956+0000: 263248.832: [GC 263248.832: [ParNew:
>>>>>> 19068K->2112K(19136K), 0.0029690 secs] 378552K->36163
>>>>>> 0K(641944K) icms_dc=3D0 , 0.0030460 secs] [Times: user=3D0.02 sys=3D=
0.00,
>>>>>> real=3D0.00 secs]
>>>>>> 2011-02-28T22:53:12.664+0000: 263256.540: [GC 263256.541: [ParNew:
>>>>>> 19136K->2112K(19136K), 0.0037690 secs] 378654K->36164
>>>>>> 7K(641944K) icms_dc=3D0 , 0.0038660 secs] [Times: user=3D0.01 sys=3D=
0.00,
>>>>>> real=3D0.01 secs]
>>>>>> 2011-02-28T22:53:20.608+0000: 263264.484: [GC 263264.484: [ParNew:
>>>>>> 19136K->1810K(19136K), 0.0032600 secs] 378671K->36134
>>>>>> 6K(641944K) icms_dc=3D0 , 0.0033470 secs] [Times: user=3D0.02 sys=3D=
0.00,
>>>>>> real=3D0.01 secs]
>>>>>> 2011-02-28T22:53:28.278+0000: 263272.154: [GC 263272.154: [ParNew:
>>>>>> 18834K->2112K(19136K), 0.0029870 secs] 378370K->36166
>>>>>> 1K(641944K) icms_dc=3D0 , 0.0030670 secs] [Times: user=3D0.01 sys=3D=
0.00,
>>>>>> real=3D0.01 secs]
>>>>>> 2011-02-28T22:53:35.919+0000: 263279.795: [GC 263279.795: [ParNew:
>>>>>> 19136K->1747K(19136K), 0.0037090 secs] 378685K->36129
>>>>>> 8K(641944K) icms_dc=3D0 , 0.0037920 secs] [Times: user=3D0.02 sys=3D=
0.00,
>>>>>> real=3D0.01 secs]
>>>>>> 11/02/28 22:53:36 INFO master.ServerManager: 9 region servers, 0 dea=
d,
>>>>>> average load 760.2222222222222
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 28, 2011 at 2:49 PM, Jean-Daniel Cryans <jdcryans@apache=
.org> wrote:
>>>>>>> Are you able to tell if that call in metaScanner is hanging or it's
>>>>>>> making multiple calls to the .META. region?
>>>>>>>
>>>>>>> If former, then jstack the region server that hosts .META. and see
>>>>>>> where it's blocked.
>>>>>>>
>>>>>>> if latter, then it means your .META. region is slow? Again, what's
>>>>>>> going on on the RS that hosts .META.?
>>>>>>>
>>>>>>> Finally, what's the master's log like during that time?
>>>>>>>
>>>>>>> J-D
>>>>>>>
>>>>>>> On Mon, Feb 28, 2011 at 2:41 PM, Nanheng Wu <nanhengwu@gmail.com> w=
rote:
>>>>>>>> J-D,
>>>>>>>>
>>>>>>>> =A0Thanks so much for your help so far! I sent disable commands on=
 4
>>>>>>>> rather small tables and they got stuck for a long time again, so I
>>>>>>>> took jstack of the master. From what I can tell, all disableTable
>>>>>>>> calls are blocked by a meta scanner thread (sample log below). At =
the
>>>>>>>> moment there were no other requests to the server at all. How shou=
ld I
>>>>>>>> investigate this further? If it helps, here's some background: I h=
ave
>>>>>>>> several datasets, each of them is in a separate table; Our data
>>>>>>>> pipeline produces a new version of each dataset everyday and only =
the
>>>>>>>> lastest version should be used. This is how the data is loaded: fo=
r
>>>>>>>> each dataset 1) run a MR job and outputs HFiles 2) call loadTable.=
rb
>>>>>>>> to create a new table 3) disable and drop the previous version. As=
 a
>>>>>>>> result some calls to load table and drop table would overlap. Plea=
se
>>>>>>>> let me know if something stands out to you as a potential culprit.
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> BTW, I am running Hadoop 0.20.2 with HBase 0.20.6
>>>>>>>>
>>>>>>>> Thread 47 (IPC Server handler 13 on 60000):
>>>>>>>> =A0State: BLOCKED
>>>>>>>> =A0Blocked count: 3801
>>>>>>>> =A0Waited count: 72719
>>>>>>>> =A0Blocked on java.lang.Object@75ac522c
>>>>>>>> =A0Blocked by 27 (RegionManager.metaScanner)
>>>>>>>> =A0Stack:
>>>>>>>> =A0 =A0org.apache.hadoop.hbase.master.TableOperation.process(Table=
Operation.java:154)
>>>>>>>> =A0 =A0org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster=
.java:842)
>>>>>>>> =A0 =A0sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>>>>>>>> =A0 =A0sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingM=
ethodAccessorImpl.java:25)
>>>>>>>> =A0 =A0java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>> =A0 =A0org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.j=
ava:657)
>>>>>>>> =A0 =A0org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseSe=
rver.java:915)
>>>>>>>>
>>>>>>>>
>>>>>>>> Thread 27 (RegionManager.metaScanner):
>>>>>>>> =A0State: WAITING
>>>>>>>> =A0Blocked count: 1526058
>>>>>>>> =A0Waited count: 1488998
>>>>>>>> =A0Waiting on org.apache.hadoop.hbase.ipc.HBaseClient$Call@4dd44ab=
0
>>>>>>>> =A0Stack:
>>>>>>>> =A0 =A0java.lang.Object.wait(Native Method)
>>>>>>>> =A0 =A0java.lang.Object.wait(Object.java:485)
>>>>>>>> =A0 =A0org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.ja=
va:722)
>>>>>>>> =A0 =A0org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRP=
C.java:333)
>>>>>>>> =A0 =A0$Proxy1.get(Unknown Source)
>>>>>>>> =A0 =A0org.apache.hadoop.hbase.master.BaseScanner.checkAssigned(Ba=
seScanner.java:543)
>>>>>>>> =A0 =A0org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseS=
canner.java:192)
>>>>>>>> =A0 =A0org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegio=
n(MetaScanner.java:73)
>>>>>>>> =A0 =A0org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(=
MetaScanner.java:129)
>>>>>>>> =A0 =A0org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanne=
r.java:153)
>>>>>>>> =A0 =A0org.apache.hadoop.hbase.Chore.run(Chore.java:68)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Feb 25, 2011 at 10:23 AM, Jean-Daniel Cryans
>>>>>>>> <jdcryans@apache.org> wrote:
>>>>>>>>> An hour to disable? That doesn't sound right at all :)
>>>>>>>>>
>>>>>>>>> I would approach this problem like I generally do with HBase issu=
e,
>>>>>>>>> first check the master log for any weirdness regarding my problem=
 (in
>>>>>>>>> this case, grep for the table name).
>>>>>>>>>
>>>>>>>>> Then I would look the region server log(s) of the nodes that were
>>>>>>>>> hosting regions from that table. You should see the steps taken t=
o
>>>>>>>>> disable the regions (starting to close, flush, region completely
>>>>>>>>> closed).
>>>>>>>>>
>>>>>>>>> If you are able to do it while it's taking a very long time to
>>>>>>>>> disable, try to jstack the process the seems to be hanging.
>>>>>>>>>
>>>>>>>>> Finally, like I said in another thread, there's a bug in 0.20.6 t=
hat
>>>>>>>>> almost prevent disabling a table (or re-enabling) if any region
>>>>>>>>> recently split and the parent wasn't cleaned yet from .META. and =
that
>>>>>>>>> is fixed in 0.90.1
>>>>>>>>>
>>>>>>>>> J-D
>>>>>>>>>
>>>>>>>>> On Thu, Feb 24, 2011 at 11:37 PM, Nanheng Wu <nanhengwu@gmail.com=
> wrote:
>>>>>>>>>> I think you are right, maybe in the long run I need to re-archit=
ect my
>>>>>>>>>> system so that it doesn't need to create new and delete old tabl=
es all
>>>>>>>>>> the time. In the short term I am having a really hard time with =
the
>>>>>>>>>> disabling function, I ran a disable command on a very small tabl=
e
>>>>>>>>>> (probably dozen of MBs in size) and are no client using the clus=
ter at
>>>>>>>>>> all, and that took about 1 hour to complete! The weird thing is =
on the
>>>>>>>>>> web UI only the region server carrying the META table has non-ze=
ro
>>>>>>>>>> requests, all other RS have 0 requests the entire time. I would =
think
>>>>>>>>>> they should get some request to flush the memstore at least. I *=
am*
>>>>>>>>>> using the same RS nodes for some map reduce job at the time and =
top
>>>>>>>>>> shows the memory usage is almost full on the META region. Would =
you
>>>>>>>>>> have some idea of what I should investigate?
>>>>>>>>>> Thanks so much.
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>