Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BB705109AF for ; Fri, 30 May 2014 20:55:03 +0000 (UTC) Received: (qmail 5034 invoked by uid 500); 30 May 2014 20:55:03 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 4984 invoked by uid 500); 30 May 2014 20:55:03 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 4975 invoked by uid 99); 30 May 2014 20:55:03 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 May 2014 20:55:03 +0000 Date: Fri, 30 May 2014 20:55:03 +0000 (UTC) From: "Colin Patrick McCabe (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Assigned] (HDFS-5809) BlockPoolSliceScanner and high speed hdfs appending make datanode to drop into infinite loop MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe reassigned HDFS-5809: ------------------------------------------ Assignee: Colin Patrick McCabe Hi ikweesung, Thanks for finding this. Do you mind if I take this one? > BlockPoolSliceScanner and high speed hdfs appending make datanode to drop into infinite loop > -------------------------------------------------------------------------------------------- > > Key: HDFS-5809 > URL: https://issues.apache.org/jira/browse/HDFS-5809 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 2.0.0-alpha > Environment: jdk1.6, centos6.4, 2.0.0-cdh4.5.0 > Reporter: ikweesung > Assignee: Colin Patrick McCabe > Priority: Critical > Labels: blockpoolslicescanner, datanode, infinite-loop > > Hello, everyone. > When hadoop cluster starts, BlockPoolSliceScanner start scanning the blocks in my cluster. > Then, randomly one datanode drop into infinite loop as the log show, and finally all datanodes drop into infinite loop. > Every datanode just verify fail by one block. > When i check the fail block like this : hadoop fsck / -files -blocks | grep blk_1223474551535936089_4702249, no hdfs file contains the block. > It seems that in while block of BlockPoolSliceScanner's scan method drop into infinite loop . > BlockPoolSliceScanner: 650 > while (datanode.shouldRun > && !datanode.blockScanner.blockScannerThread.isInterrupted() > && datanode.isBPServiceAlive(blockPoolId)) { .... > The log finally printed in method verifyBlock(BlockPoolSliceScanner:453). > Please excuse my poor English. > ------------------------------------------------------------------------------------------------------------------------------------------------- > LOG: > 2014-01-21 18:36:50,582 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification failed for BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - may be due to race with write > 2014-01-21 18:36:50,582 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification failed for BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - may be due to race with write > 2014-01-21 18:36:50,582 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification failed for BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - may be due to race with write -- This message was sent by Atlassian JIRA (v6.2#6252)