Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 274EBF107 for ; Tue, 7 May 2013 16:30:01 +0000 (UTC) Received: (qmail 79416 invoked by uid 500); 7 May 2013 16:29:58 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 79309 invoked by uid 500); 7 May 2013 16:29:58 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 79301 invoked by uid 99); 7 May 2013 16:29:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 May 2013 16:29:58 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yuzhihong@gmail.com designates 209.85.217.179 as permitted sender) Received: from [209.85.217.179] (HELO mail-lb0-f179.google.com) (209.85.217.179) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 May 2013 16:29:51 +0000 Received: by mail-lb0-f179.google.com with SMTP id d10so924831lbj.24 for ; Tue, 07 May 2013 09:29:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=Jwg+r4V9y5UVuKLCXJlJYy27rolR0pIheWvSeAMcQ70=; b=EcHfuNRDbfJAZspNMql1z+8dQAsg7r7TfKPqbPAV8X5SdQ3z6f0Nnn/D/+eSLshIpj SPtPgPSYWEVZpiOqMdAGokI5osQ5tqph65k2GehxMrRZyiZMuuN4G0UrTcwqNklcabf4 4/hbjY1S3nmBRY4r33DpVjMsVDbezvwweBeYR6vbEKucPUjKxlmquybLXUlpwjSn+MdS tgUusgM0dcdl12rYPi/ylEZlPWktWMI5DI4AhzKRD3/gJVbQdIBsrHg0rk8O115ik8wc EogJcYPIbuQKCbVYf5FEXnl+X8HsyIcDerDchxlF9tljNf7RHUVv61riYJR5UEwEvvWt 2n7w== MIME-Version: 1.0 X-Received: by 10.112.147.229 with SMTP id tn5mr1391728lbb.112.1367944171008; Tue, 07 May 2013 09:29:31 -0700 (PDT) Received: by 10.112.136.104 with HTTP; Tue, 7 May 2013 09:29:30 -0700 (PDT) In-Reply-To: References: Date: Tue, 7 May 2013 09:29:30 -0700 Message-ID: Subject: Re: Failed deleting my ephemeral node From: Ted Yu To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=047d7b3a897a62ba4d04dc2352be X-Virus-Checked: Checked by ClamAV on apache.org --047d7b3a897a62ba4d04dc2352be Content-Type: text/plain; charset=ISO-8859-1 Can you tell us a bit more about your zookeeper setup ? Checking zookeeper log around 2013-04-16 14:31:24 would help, too. Cheers On Tue, May 7, 2013 at 6:05 AM, Fabien Chung wrote: > Hi all, > > i have a cluster with 8 machines (CDH4). I use an ETL (Talend) to insert > data into hbase. Mostof time that works perfectly, but sometimes rows are > not inserted, and i don't have any clue about the reason of the failure. I > have 0 errors on Talend. That usually happens when i delete the table in > hbase and i recreate a new one from Talend. > > I think these logs are revelant : > * > * > *2013-04-16 14:31:09,610 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC > Server handler 5 on 60020: exiting* > *2013-04-16 14:31:09,610 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > IPC Server Responder* > *2013-04-16 14:31:09,610 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC > Server handler 4 on 60020: exiting* > *2013-04-16 14:31:09,610 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC > Server handler 0 on 60020: exiting* > *2013-04-16 14:31:09,610 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC > Server handler 6 on 60020: exiting* > *2013-04-16 14:31:09,609 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC > Server handler 7 on 60020: exiting* > *2013-04-16 14:31:09,609 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC > Server handler 3 on 60020: exiting* > *2013-04-16 14:31:09,609 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC > Server handler 8 on 60020: exiting* > *2013-04-16 14:31:09,609 INFO > org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to > stop the worker thread* > *2013-04-16 14:31:09,610 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer* > *2013-04-16 14:31:09,610 INFO > org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker > interrupted while waiting for task, exiting: > java.lang.InterruptedException* > *2013-04-16 14:31:09,610 INFO > org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker > NODE11.ysance.local,60020,1366110719610 exiting* > *2013-04-16 14:31:09,611 INFO org.mortbay.log: Stopped > SelectChannelConnector@0.0.0.0:60030* > *2013-04-16 14:31:09,712 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: > regionserver60020.cacheFlusher exiting* > *2013-04-16 14:31:09,712 INFO > org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting.* > *2013-04-16 14:31:09,712 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: > regionserver60020.compactionChecker exiting* > *2013-04-16 14:31:09,712 INFO > org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: > Stopping RegionServerSnapshotManager gracefully.* > *2013-04-16 14:31:09,727 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down* > *2013-04-16 14:31:09,727 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x13e128af3010001 closed* > *2013-04-16 14:31:09,727 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server > NODE11.ysance.local,60020,1366110719610* > *2013-04-16 14:31:09,728 INFO > org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: > Stopping RegionServerSnapshotManager gracefully.* > *2013-04-16 14:31:09,728 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server > NODE11.ysance.local,60020,1366110719610; all regions closed.* > *2013-04-16 14:31:09,728 INFO > org.apache.hadoop.hbase.regionserver.wal.HLog: regionserver60020.logSyncer > exiting* > *2013-04-16 14:31:10,161 INFO org.apache.hadoop.hbase.regionserver.Leases: > regionserver60020 closing leases* > *2013-04-16 14:31:10,161 INFO org.apache.hadoop.hbase.regionserver.Leases: > regionserver60020 closed leases* > *2013-04-16 14:31:10,163 WARN > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient > ZooKeeper exception: > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for > /hbase/rs/NODE11.ysance.local,60020,1366110719610* > *2013-04-16 14:31:10,163 INFO org.apache.hadoop.hbase.util.RetryCounter: > Sleeping 2000ms before retry #1...* > *2013-04-16 14:31:12,163 WARN > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient > ZooKeeper exception: > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for > /hbase/rs/NODE11.ysance.local,60020,1366110719610* > *2013-04-16 14:31:12,163 INFO org.apache.hadoop.hbase.util.RetryCounter: > Sleeping 4000ms before retry #2...* > *2013-04-16 14:31:16,163 WARN > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient > ZooKeeper exception: > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for > /hbase/rs/NODE11.ysance.local,60020,1366110719610* > *2013-04-16 14:31:16,163 INFO org.apache.hadoop.hbase.util.RetryCounter: > Sleeping 8000ms before retry #3...* > *2013-04-16 14:31:19,389 INFO org.apache.hadoop.hbase.regionserver.Leases: > regionserver60020.leaseChecker closing leases* > *2013-04-16 14:31:19,390 INFO org.apache.hadoop.hbase.regionserver.Leases: > regionserver60020.leaseChecker closed leases* > *2013-04-16 14:31:24,163 WARN > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient > ZooKeeper exception: > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for > /hbase/rs/NODE11.ysance.local,60020,1366110719610* > *2013-04-16 14:31:24,163 ERROR > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper delete > failed after 3 retries* > *2013-04-16 14:31:24,164 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: Failed deleting my > ephemeral node* > *org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for > /hbase/rs/NODE11.ysance.local,60020,1366110719610* > * at > org.apache.zookeeper.KeeperException.create(KeeperException.java:127)* > * at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)* > * at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)* > * at > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:137) > * > * at > org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1215)* > * at > org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1204)* > * at > > org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1068) > * > * at > > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:846) > * > * at java.lang.Thread.run(Thread.java:662)* > *2013-04-16 14:31:24,165 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server > NODE11.ysance.local,60020,1366110719610; zookeeper connection closed.* > *2013-04-16 14:31:24,165 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020 > exiting* > *2013-04-16 14:31:24,165 INFO > org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown > hook thread.* > *2013-04-16 14:31:24,166 INFO > org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished.* > > In my mind, the issue comes from zookeeper/ regionserver but I can't > really identify where exactly the problem is. > > Do you have any idea ? > > Regards > > -- > Chung Fabien > --047d7b3a897a62ba4d04dc2352be--