Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 33502101CF for ; Mon, 7 Sep 2015 05:10:47 +0000 (UTC) Received: (qmail 83102 invoked by uid 500); 7 Sep 2015 05:10:47 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 83053 invoked by uid 500); 7 Sep 2015 05:10:47 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 83042 invoked by uid 99); 7 Sep 2015 05:10:46 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Sep 2015 05:10:46 +0000 Date: Mon, 7 Sep 2015 05:10:46 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733241#comment-14733241 ] stack edited comment on HBASE-14317 at 9/7/15 5:09 AM: ------------------------------------------------------- This fail has these zombies: kalashnikov:hbase.git.commit stack$ python ./dev-support/findHangingTests.py https://builds.apache.org/job/PreCommit-HBASE-Build/15446//consoleText Fetching the console output from the URL Printing hanging tests Hanging test : org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization Hanging test : org.apache.hadoop.hbase.security.access.TestScanEarlyTermination Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController2 Hanging test : org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithACL Printing Failing tests Failing test : org.apache.hadoop.hbase.client.TestSnapshotCloneIndependence Some overlap. The hang is easy to reproduce locally. Looking at it, there is nought related to WAL. I see at org.apache.hadoop.hbase.security.access.TestAccessController.testAccessControlClientGlobalGrantRevoke(TestAccessController.java:2226) hung... poking around, nothing plain at mo. Will be back. I'm just going to commit this fat patch and then work on these seemingly unrelated zombies. was (Author: stack): This fail has these zombies: kalashnikov:hbase.git.commit stack$ python ./dev-support/findHangingTests.py https://builds.apache.org/job/PreCommit-HBASE-Build/15446//consoleText Fetching the console output from the URL Printing hanging tests Hanging test : org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization Hanging test : org.apache.hadoop.hbase.security.access.TestScanEarlyTermination Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController2 Hanging test : org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithACL Printing Failing tests Failing test : org.apache.hadoop.hbase.client.TestSnapshotCloneIndependence Some overlap. I'm just going to commit this fat patch and then work on these seemingly unrelated zombies. > Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL > ----------------------------------------------------- > > Key: HBASE-14317 > URL: https://issues.apache.org/jira/browse/HBASE-14317 > Project: HBase > Issue Type: Bug > Affects Versions: 1.2.0, 1.1.1 > Reporter: stack > Assignee: stack > Priority: Blocker > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, subset.of.rs.log, timeouts.branch-1.txt > > > hbase-1.1.1 and hadoop-2.7.1 > We try to roll logs because can't append (See HDFS-8960) but we get stuck. See attached thread dump and associated log. What is interesting is that syncers are waiting to take syncs to run and at same time we want to flush so we are waiting on a safe point but there seems to be nothing in our ring buffer; did we go to roll log and not add safe point sync to clear out ringbuffer? > Needs a bit of study. Try to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)