Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@locus.apache.org Received: (qmail 56249 invoked from network); 29 Sep 2008 14:18:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Sep 2008 14:18:56 -0000 Received: (qmail 82284 invoked by uid 500); 29 Sep 2008 14:18:54 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 82114 invoked by uid 500); 29 Sep 2008 14:18:53 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 82103 invoked by uid 99); 29 Sep 2008 14:18:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Sep 2008 07:18:53 -0700 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bhaddow@inf.ed.ac.uk designates 129.215.13.205 as permitted sender) Received: from [129.215.13.205] (HELO nougat.ucs.ed.ac.uk) (129.215.13.205) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Sep 2008 14:17:50 +0000 Received: from nutty.inf.ed.ac.uk (nutty.inf.ed.ac.uk [129.215.33.33]) by nougat.ucs.ed.ac.uk (8.13.8/8.13.4) with ESMTP id m8TEIKqQ024772 for ; Mon, 29 Sep 2008 15:18:20 +0100 (BST) Received: from suede.inf.ed.ac.uk (suede.inf.ed.ac.uk [129.215.24.156]) by nutty.inf.ed.ac.uk (8.13.8/8.13.8) with ESMTP id m8TEIKDu024597 for ; Mon, 29 Sep 2008 15:18:20 +0100 From: Barry Haddow To: hbase-user@hadoop.apache.org Subject: Region servers shut down with UnknownScannerException Date: Mon, 29 Sep 2008 15:18:19 +0100 User-Agent: KMail/1.9.7 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200809291518.20147.bhaddow@inf.ed.ac.uk> X-Edinburgh-Scanned: at nougat.ucs.ed.ac.uk with MIMEDefang 2.60, Sophie, Sophos Anti-Virus, Clam AntiVirus X-Scanned-By: MIMEDefang 2.60 on 129.215.13.205 X-Virus-Checked: Checked by ClamAV on apache.org Hi I recently set up a small hbase cluster (v 0.18) running on top of hadoop v.0.18.1. However I'm observing that the region servers spontaneously shut themselves down, usually with an UnknownScannerException. For instance, this weekend, I discovered that all four had shut down, with messages like the following in the logs: 2008-09-29 05:50:17,203 INFO org.apache.hadoop.dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 129.215.197.39:50010 2008-09-29 05:50:17,203 INFO org.apache.hadoop.dfs.DFSClient: Abandoning block blk_-5829206400135277905_3045 2008-09-29 07:29:16,552 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_CALL_SERVER_STARTUP 2008-09-29 07:46:35,796 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 60020, call next(-1347145425990165691) from 129.215.197.39:6999: error: org.apache.hadoop.hbase.UnknownScannerException: Name: -1347145425990165691 The underlying hdfs seems fine - fsck reports the hbase directory as healthy. After a restart hbase seems fine too, but surely the regionservers should stay up once they're started, Any suggestions? regards Barry -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.