Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 62D1E18F7F for ; Wed, 23 Dec 2015 12:20:49 +0000 (UTC) Received: (qmail 57540 invoked by uid 500); 23 Dec 2015 12:20:47 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 57462 invoked by uid 500); 23 Dec 2015 12:20:47 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 57450 invoked by uid 99); 23 Dec 2015 12:20:47 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Dec 2015 12:20:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id DBA7A1804C1 for ; Wed, 23 Dec 2015 12:20:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id u6_k4e3LNCPT for ; Wed, 23 Dec 2015 12:20:38 +0000 (UTC) Received: from mail-qg0-f42.google.com (mail-qg0-f42.google.com [209.85.192.42]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 10DBF2059E for ; Wed, 23 Dec 2015 12:20:37 +0000 (UTC) Received: by mail-qg0-f42.google.com with SMTP id k90so152180444qge.0 for ; Wed, 23 Dec 2015 04:20:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:content-type:message-id:mime-version:subject:date:references :to:in-reply-to; bh=kjbTPSSfa5lj9ia+D/JYGj9DRKJTHjsSQanflnQXaPw=; b=uOH8eqdO9y943jBjzMRH3XlvayNMk0NjFgEvYsDdewKzAQMXJVJwpAqNbH4v4NK3ed 1vVcxjGTiLNyrIuWfejbxd/ZRRaOJYyK7YwfZEUHpKQ9WK61fo/mLLpUUtVBsM97tfkL SogggJPZystlpMlrjPY6BDQitz2jLrjFmS/NgrGc7Z05d5UfIxK3oy4FAbjtT67WIfWB /ZudATCR9mRo9O7MXRcFxxgWpTDYWvomKM3vY1RckZrUd4SyuHCF4Dpj4iRk9uqXsx2C uUowJ3zCjOOMeV310VEpKqPrgFZXKKEemTv4ik75xYBSpRHft65WpaybD5IAsKpcf2A7 78YQ== X-Received: by 10.140.255.10 with SMTP id a10mr41878645qhd.103.1450873230172; Wed, 23 Dec 2015 04:20:30 -0800 (PST) Received: from [10.10.0.3] (cpe-67-253-83-72.maine.res.rr.com. [67.253.83.72]) by smtp.gmail.com with ESMTPSA id h64sm8544504qgh.39.2015.12.23.04.20.29 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Dec 2015 04:20:29 -0800 (PST) From: Brian Jeltema Content-Type: multipart/alternative; boundary="Apple-Mail=_3080E70F-C03F-48F4-BEF1-360055C25D05" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 9.1 \(3096.5\)) Subject: Re: regions in transition Date: Wed, 23 Dec 2015 07:20:28 -0500 References: <552F68EF-1469-4E23-83F0-0294AADFE521@digitalenvoy.net> <512A4548-13CA-4B62-8223-0DA7C9E0AF66@gmail.com> <5E993C94-2A75-439E-B024-044229F758F0@digitalenvoy.net> <94302313-851D-4F1A-9DF6-E2A966BBEBB0@digitalenvoy.net> To: user@hbase.apache.org In-Reply-To: <94302313-851D-4F1A-9DF6-E2A966BBEBB0@digitalenvoy.net> X-Mailer: Apple Mail (2.3096.5) --Apple-Mail=_3080E70F-C03F-48F4-BEF1-360055C25D05 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Update on this: deleting the contents the /hbase-unsecure/region-in-transition node did = fix my problem with HBase finding my table regions. I'm still have a problem though, possibly related. I=E2=80=99m seeing = OutOfMemory errors in the region server logs (modified slightly): 2015-12-23 06:52:37,466 INFO [RS_LOG_REPLAY_OPS-p7:60020-0] = handler.HLogSplitterHandler: worker p7.foo.net,60020,1450871487168 done = with task = /hbase-unsecure/splitWAL/WALs%2Fp15.foo.net%2C60020%2C1450535337455-splitt= ing%2Fp15.foo.net%252C60020%252C1450535337455.1450535339318 in 68348ms 2015-12-23 06:52:37,466 ERROR [RS_LOG_REPLAY_OPS-p7:60020-0] = executor.EventHandler: Caught throwable while processing event = RS_LOG_REPLAY java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:713) at = java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:= 949) at = java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:13= 60) at = java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionSe= rvice.java:181) at = org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOut= putSink.close(HLogSplitter.java:1121) at = org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOut= putSink.finishWritingAndClose(HLogSplitter.java:1086) at = org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSpl= itter.java:360) at = org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSpl= itter.java:220) at = org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.= java:143) at = org.apache.hadoop.hbase.regionserver.handler.HLogSplitterHandler.process(H= LogSplitterHandler.java:82) at = org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at = java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:= 1145) at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :615) at java.lang.Thread.run(Thread.java:744) The region servers are configured with an 8G heap. I initially thought = this might be a ulimit problem, so I bumped the open file limit to about 10K and the process limit up to 2048, but that = did not seem to matter. What other parameters might be causing an OOM error? Thanks Brian > On Dec 22, 2015, at 12:46 PM, Brian Jeltema = wrote: >=20 >>=20 >> You should really find out where you hmaster ui lives (there is a = master UI >> for every node provided by the apache project) because it gives you >> information on the state of your system, >=20 > I=E2=80=99m familiar with the HMaster UI. I=E2=80=99m looking at it = now. It does not contain > the information you describe. There is a list of region servers and an > a menu bar that contains: Home Table Details Local Logs Degug = Dump Metrics Dump HBase Configuration >=20 > If I click on the Table Details item, I get a list of the tables. If I = click on a table, there is a Tasks section that says > No tasks currently runining on this node. >=20 > The region server logs do not contain any records relating to RITs, or = really even regions. > The master UI does not contain any information about RITs > Version: HDP 2.2 -> HBase 0.98.4 >=20 > The zookeeper node /hbase-unsecure/regions-in-transition contains a = long list of items > that are not removed when I restart the service. I think this is a = side-effect of problems > I had when I did the HDP 2.1 -> HDP 2.2 upgrade, which did not go = well.=20 >=20 > I would like to remove or clear the = /hbase-unsecure/region-in-transition node > as an experiment. I=E2=80=99m just looking for guidance on whether = that is a safe thing to do. >=20 > Brian >=20 >> but if you want to skip all that, >> here are the instructions for OfflineRepair, without knowing what is >> happening with your system (logs, master ui info) you can try this = but at >> your own risk. >>=20 >> OfflineMetaRepair. >> Description Below: >> This code is used to rebuild meta off line from file system data. If = there >> * are any problem detected, it will fail suggesting actions for the = user >> to do >> * to "fix" problems. If it succeeds, it will backup the previous >> hbase:meta and >> * -ROOT- dirs and write new tables in place. >>=20 >> Stop HBase >> zookeeper-client rmr /hbase >> HADOOP_USER_NAME=3Dhbase hbase >> org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair >> start hbase >>=20 >> ^ This has worked for me in some situations where I understood HDFS = and >> Zookeeper disagreed on region locations, but keep in mind I have = tried this >> on hbase 1.0.0 and your mileage may vary. >>=20 >> We don't have your hbase version (you can even find this on the hbase = shell) >> We don't have logs msgs >> We don't have master's view of your RITs >>=20 >>=20 >> On Tue, Dec 22, 2015 at 11:52 AM, Brian Jeltema = wrote: >>=20 >>> I=E2=80=99m running Ambari 2.0.2 and HPD 2.2. I don=E2=80=99t see = any of this displayed at >>> master:60010. >>>=20 >>> I really think this problem is the result of cruft in ZooKeeper. = Does >>> anybody know >>> if it=E2=80=99s safe to delete the node? >>>=20 >>>=20 >>>> On Dec 22, 2015, at 11:40 AM, Geovanie Marquez < >>> geovanie.marquez@gmail.com> wrote: >>>>=20 >>>> check hmaster:60010 under TASKS (between Software Attributes and = Tables) >>>> you will see if you have regions in transition. This will tell you = which >>>> regions are transitioning and you can go to those region server = logs and >>>> check them, I've run into a couple of these and every time they've = talk >>> to >>>> me about their problem. >>>>=20 >>>> Also, under Software Attributes you can check the HBase version. >>>>=20 >>>> On Tue, Dec 22, 2015 at 11:29 AM, Ted Yu = wrote: >>>>=20 >>>>> =46rom RegionListTmpl.jamon : >>>>>=20 >>>>> <%if (onlineRegions !=3D null && onlineRegions.size() > 0) %> >>>>> ... >>>>> <%else> >>>>>

Not serving regions

>>>>> >>>>>=20 >>>>> The message means that there was no region online on the = underlying >>> server. >>>>>=20 >>>>> FYI >>>>>=20 >>>>> On Tue, Dec 22, 2015 at 7:18 AM, Brian Jeltema = >>>>> wrote: >>>>>=20 >>>>>> Following up, if I look at the MBase Master UI in the Ambari = console I >>>>> see >>>>>> links to >>>>>> all of the region servers. If I click on those links, the Region = Server >>>>>> page comes >>>>>> up and in the Regions section, is displays =E2=80=98Not serving = regions=E2=80=99. I=E2=80=99m >>> not >>>>>> sure >>>>>> if that means something is disabled, or it just doesn=E2=80=99t = have any >>> regions >>>>>> to server. >>>>>>=20 >>>>>>> On Dec 22, 2015, at 6:19 AM, Brian Jeltema >>>>> wrote: >>>>>>>=20 >>>>>>>>=20 >>>>>>>> Can you pick a few regions stuck in transition and check = related >>>>> region >>>>>>>> server logs to see why they couldn't be assigned ? >>>>>>>=20 >>>>>>> I don=E2=80=99t see anything in the region logs relating any = regions. >>>>>>>=20 >>>>>>>>=20 >>>>>>>> Which release were you using previously ? >>>>>>>=20 >>>>>>> HDP 2.1 -> HDP 2.2 >>>>>>>=20 >>>>>>> So is it safe to stop HBase and delete the ZK node? >>>>>>>=20 >>>>>>>>=20 >>>>>>>> Thanks >>>>>>>>=20 >>>>>>>> On Mon, Dec 21, 2015 at 3:54 PM, Brian Jeltema = >>>>>> wrote: >>>>>>>>=20 >>>>>>>>> I am doing a cluster upgrade to the HDP 2.2 stack. For some = reason, >>>>>> after >>>>>>>>> the upgrade HBase >>>>>>>>> cannot find any regions for existing tables. I believe the = HDFS file >>>>>>>>> system is OK. But looking at the ZooKeeper >>>>>>>>> nodes, I noticed that many (maybe all) of the regions were = listed in >>>>>> the >>>>>>>>> ZooKeeper >>>>>>>>> /hbase-unsecure/region-in-transition node. I suspect this = could be >>>>>> causing >>>>>>>>> a problem. Is it >>>>>>>>> safe to stop HBase and delete that node? >>>>>>>>>=20 >>>>>>>>> Thanks >>>>>>>>> Brian >>>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>=20 >>>=20 >>>=20 >=20 >=20 --Apple-Mail=_3080E70F-C03F-48F4-BEF1-360055C25D05--