hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
Date Mon, 07 Sep 2015 05:10:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733241#comment-14733241
] 

stack edited comment on HBASE-14317 at 9/7/15 5:09 AM:
-------------------------------------------------------

This fail has these zombies:

kalashnikov:hbase.git.commit stack$ python ./dev-support/findHangingTests.py https://builds.apache.org/job/PreCommit-HBASE-Build/15446//consoleText
Fetching the console output from the URL
Printing hanging tests
Hanging test : org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization
Hanging test : org.apache.hadoop.hbase.security.access.TestScanEarlyTermination
Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController2
Hanging test : org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithACL
Printing Failing tests
Failing test : org.apache.hadoop.hbase.client.TestSnapshotCloneIndependence


Some overlap.

The hang is easy to reproduce locally. Looking at it, there is nought related to WAL. I see

	at org.apache.hadoop.hbase.security.access.TestAccessController.testAccessControlClientGlobalGrantRevoke(TestAccessController.java:2226)

hung... poking around, nothing plain at mo. Will be back.

I'm just going to commit this fat patch and then work on these seemingly unrelated zombies.




was (Author: stack):
This fail has these zombies:

kalashnikov:hbase.git.commit stack$ python ./dev-support/findHangingTests.py https://builds.apache.org/job/PreCommit-HBASE-Build/15446//consoleText
Fetching the console output from the URL
Printing hanging tests
Hanging test : org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization
Hanging test : org.apache.hadoop.hbase.security.access.TestScanEarlyTermination
Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController2
Hanging test : org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithACL
Printing Failing tests
Failing test : org.apache.hadoop.hbase.client.TestSnapshotCloneIndependence


Some overlap.

I'm just going to commit this fat patch and then work on these seemingly unrelated zombies.



> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -----------------------------------------------------
>
>                 Key: HBASE-14317
>                 URL: https://issues.apache.org/jira/browse/HBASE-14317
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.1.1
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 2.0.0, 1.2.0, 1.3.0
>
>         Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt,
14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt,
14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 14317v11.txt,
14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 14317v5.branch-1.2.txt, 14317v5.txt,
14317v9.txt, HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch,
HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - Pastebin.com.html, append-only-test.patch,
raw.php, repro.txt, san_dump.txt, subset.of.rs.log, timeouts.branch-1.txt
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. See attached
thread dump and associated log. What is interesting is that syncers are waiting to take syncs
to run and at same time we want to flush so we are waiting on a safe point but there seems
to be nothing in our ring buffer; did we go to roll log and not add safe point sync to clear
out ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message