Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CBBD4200C08 for ; Thu, 26 Jan 2017 22:14:30 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id CA49C160B4C; Thu, 26 Jan 2017 21:14:30 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EFAAA160B31 for ; Thu, 26 Jan 2017 22:14:29 +0100 (CET) Received: (qmail 65867 invoked by uid 500); 26 Jan 2017 21:14:29 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 65856 invoked by uid 99); 26 Jan 2017 21:14:29 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Jan 2017 21:14:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id A2D2D1A0138 for ; Thu, 26 Jan 2017 21:14:28 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.199 X-Spam-Level: X-Spam-Status: No, score=-1.199 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id hkLmtMeUPEG2 for ; Thu, 26 Jan 2017 21:14:26 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id F2BFE5F472 for ; Thu, 26 Jan 2017 21:14:25 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 1938DE03A4 for ; Thu, 26 Jan 2017 21:14:25 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 743D82528C for ; Thu, 26 Jan 2017 21:14:24 +0000 (UTC) Date: Thu, 26 Jan 2017 21:14:24 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-17501) NullPointerException after Datanodes Decommissioned and Terminated MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 26 Jan 2017 21:14:31 -0000 [ https://issues.apache.org/jira/browse/HBASE-17501?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D158= 40469#comment-15840469 ]=20 stack commented on HBASE-17501: ------------------------------- [~lumost] Thanks for looking. We seem to only go the seekToNewSource if a C= heckSumException. Yeah, you'd think if an NPE or a repeated IOE, we should = try new source. Is that what you were thinking sir? > NullPointerException after Datanodes Decommissioned and Terminated > ------------------------------------------------------------------ > > Key: HBASE-17501 > URL: https://issues.apache.org/jira/browse/HBASE-17501 > Project: HBase > Issue Type: Bug > Environment: CentOS Derivative with a derivative of the 3.18.43 k= ernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patche= s. > Reporter: Patrick Dignan > Priority: Minor > > We recently encountered an interesting NullPointerException in HDFS that = bubbles up to HBase, and is resolved be restarting the regionserver. The i= ssue was exhibited while we were replacing a set of nodes in one of our clu= sters with a new set. We did the following: > 1. Turn off the HBase balancer > 2. Gracefully move the regions off the nodes we=E2=80=99re shutting off u= sing a tool we wrote to do so > 3. Decommission the datanodes using the HDFS exclude hosts file and hdfs = dfsadmin -refreshNodes > 4. Wait for the datanodes to decommission fully > 5. Terminate the VMs the instances are running inside. > A few notes. We did not shutdown the datanode processes, and the nodes w= ere therefore not marked as dead by the namenode. We simply terminated the= datanode VM (in this case an AWS instance). The nodes were marked as deco= mmissioned. We are running our clusters with DNS, and when we terminate VM= s, the associated CName is removed and no longer resolves. The errors do n= ot seem to resolve without a restart. > After we did this, the remaining regionservers started throwing NullPoint= erExceptions with the following stack trace: > 2017-01-19 23:09:05,638 DEBUG org.apache.hadoop.hbase.ipc.RpcServer: RpcS= erver.RW.fifo.Q.read.handler=3D80,queue=3D14,port=3D60020: callId: 17277238= 91 service: ClientService methodName: Scan size: 216 connection: 172.16.36.= 128:31538 > java.io.IOException > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2214) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.ja= va:204) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.ja= va:183) > Caused by: java.lang.NullPointerException > at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:156= 4) > at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java= :62) > at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readA= tOffset(HFileBlock.java:1434) > at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlock= DataInternal(HFileBlock.java:1682) > at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlock= Data(HFileBlock.java:1542) > at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileRead= erV2.java:445) > at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.= loadDataBlockWithScanInfo(HFileBlockIndex.java:266) > at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.s= eekTo(HFileReaderV2.java:642) > at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.s= eekTo(HFileReaderV2.java:592) > at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfte= r(StoreFileScanner.java:294) > at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFi= leScanner.java:199) > at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(Sto= reScanner.java:343) > at org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScan= ner.java:198) > at org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.j= ava:2106) > at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java= :2096) > at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:5544) > at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScan= ner(HRegion.java:2569) > at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.ja= va:2555) > at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.ja= va:2536) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServi= ces.java:2405) > at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientServ= ice$2.callBlockingMethod(ClientProtos.java:33738) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170) > ... 3 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)