hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7695) Intermittent failures in TestOpenFilesWithSnapshot
Date Wed, 28 Jan 2015 22:50:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295998#comment-14295998

Konstantin Shvachko commented on HDFS-7695:

This was partly investigated under HDFS-7611. The simptoms looked similar to the bug described
Different test cases are failing there on different runs, with the same exception
java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
	at org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1200)
	at org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1825)
	at org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1786)
	at org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testParentDirWithUCFileDeleteWithSnapShot(TestOpenFilesWithSnapshot.java:89)
The test
- creates a file and starts adding data
- then aborts the stream
- creates a snapshot while file is not closed
- deletes the file without deleting the snapshot and
- restarts NameNode

The behavior I see from the logs (added extanded logging info) that on restart NN replays
the edits acoording to the steps above. The block are then reported by DNs, but they remain
having 0 replicas, and therefore NN cannot leave SafeMode.
The missing blocks are supposed to be present, because even though the file was deleted its
snapshot was not. I do not understand why replicas are not added to the locations when they
are reported.

> Intermittent failures in TestOpenFilesWithSnapshot
> --------------------------------------------------
>                 Key: HDFS-7695
>                 URL: https://issues.apache.org/jira/browse/HDFS-7695
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.6.0
>            Reporter: Konstantin Shvachko
> This is to investigate intermittent failures of {{TestOpenFilesWithSnapshot}}, which
is timing out on the NameNode restart as it is unable to leave SafeMode.

This message was sent by Atlassian JIRA

View raw message