Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 13384 invoked from network); 14 Jan 2011 19:15:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Jan 2011 19:15:09 -0000 Received: (qmail 71634 invoked by uid 500); 14 Jan 2011 19:15:08 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 71046 invoked by uid 500); 14 Jan 2011 19:15:07 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 71037 invoked by uid 99); 14 Jan 2011 19:15:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Jan 2011 19:15:07 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Jan 2011 19:15:06 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id p0EJEkFL019013 for ; Fri, 14 Jan 2011 19:14:46 GMT Message-ID: <15972732.373541295032486470.JavaMail.jira@thor> Date: Fri, 14 Jan 2011 14:14:46 -0500 (EST) From: "James Kennedy (JIRA)" To: issues@hbase.apache.org Subject: [jira] Created: (HBASE-3445) Master crashes on when data moved to different host MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Master crashes on when data moved to different host --------------------------------------------------- Key: HBASE-3445 URL: https://issues.apache.org/jira/browse/HBASE-3445 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.0 Reporter: James Kennedy Priority: Critical Fix For: 0.90.0 While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test data on one machine and transferred to another machine the HMaster on the new machine dies on startup. Based on the following stack trace it looks as though it is attempting to find the .meta region with the ip address of the original machine. Instead of waiting around for RegionServer's to register with new location data, HMaster throws it's hands up with a FATAL exception. Note that deleting the zookeeper dir makes no difference. Also note that so far I have only reproduced this in my own environment using the hbase-trx extension of HBase and an ApplicationStarter that starts the Master and RegionServer together in the same JVM. While the issue seems likely isolated from those factors it is far from a vanilla HBase environment. I will spend some time trying to reproduce the issue in a proper hbase test. But perhaps someone can beat me to it? How do I simulate the IP switch? May require a data.tar upload. [14/01/11 10:45:20] 6396 [ Thread-298] ERROR server.quorum.QuorumPeerConfig - Invalid configuration, only one server specified (ignoring) [14/01/11 10:45:21] 7178 [ main] INFO ion.service.HBaseRegionService - troove> region port: 60010 [14/01/11 10:45:21] 7180 [ main] INFO ion.service.HBaseRegionService - troove> region interface: org.apache.hadoop.hbase.ipc.IndexedRegionInterface [14/01/11 10:45:21] 7180 [ main] INFO ion.service.HBaseRegionService - troove> root dir: hdfs://localhost:8701/hbase [14/01/11 10:45:21] 7180 [ main] INFO ion.service.HBaseRegionService - troove> Initializing region server. [14/01/11 10:45:21] 7631 [ main] INFO ion.service.HBaseRegionService - troove> Starting region server thread. [14/01/11 10:46:54] 100764 [ HMaster] FATAL he.hadoop.hbase.master.HMaster - Unhandled exception. Starting shutdown. java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=192.168.1.102/192.168.1.102:60020] at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258) at $Proxy14.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384) at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.