hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shubham Sharma (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HAWQ-1504) Namenode hangs during restart of docker environment configured using incubator-hawq/contrib/hawq-docker/
Date Tue, 18 Jul 2017 01:21:00 GMT
Shubham Sharma created HAWQ-1504:

             Summary: Namenode hangs during restart of docker environment configured using
                 Key: HAWQ-1504
                 URL: https://issues.apache.org/jira/browse/HAWQ-1504
             Project: Apache HAWQ
          Issue Type: Bug
          Components: Command Line Tools
            Reporter: Shubham Sharma
            Assignee: Radar Lei

After setting up an environment using instructions provided under incubator-hawq/contrib/hawq-docker/,
while trying to restart docker containers namenode hangs and tries a namenode -format during
every start.

Steps to reproduce this issue - 

- Navigate to incubator-hawq/contrib/hawq-docker
- make stop
- make start
- docker exec -it centos7-namenode bash
- ps -ef | grep java

You can see namenode -format running.
[gpadmin@centos7-namenode data]$ ps -ef | grep java
hdfs        11    10  1 00:56 ?        00:00:06 /etc/alternatives/java_sdk/bin/java -Dproc_namenode
-Xmx1000m -Dhdfs.namenode=centos7-namenode -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log
-Dhadoop.home.dir=/usr/hdp/ -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console
-Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,NullAppender
org.apache.hadoop.hdfs.server.namenode.NameNode -format

Since namenode -format runs in interactive mode and at this stage it is waiting for a (Yes/No)
response, the namenode will remain stuck forever. This makes hdfs unavailable.

Root cause of the problem - 

In the dockerfiles present under incubator-hawq/contrib/hawq-docker/centos6-docker/hawq-test
and incubator-hawq/contrib/hawq-docker/centos7-docker/hawq-test, the docker directive ENTRYPOINT
executes entrypoin.sh during startup.

The entrypoint.sh in turn executes start-hdfs.sh. start-dfs.sh checks for the following -

if [ ! -d /tmp/hdfs/name/current ]; then
    su -l hdfs -c "hdfs namenode -format"

My assumption is it looks for fsimage and edit logs. If they are not present the script assumes
that this a first time initialization and namenode format should be done. However, path /tmp/hdfs/name/current
does not exist on namenode. 

>From namenode logs it is clear that fsimage and edit logs are written under /tmp/hadoop-hdfs/dfs/name/current.

2017-07-18 00:55:20,892 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: No edit log streams
2017-07-18 00:55:20,893 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Planning to load
image: FSImageFile(file=/tmp/hadoop-hdfs/dfs/name/current/fsimage_0000000000000000000, cpktTxId=0000000000000000000)
2017-07-18 00:55:20,995 INFO org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode:
Loading 1 INodes.
2017-07-18 00:55:21,064 INFO org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf:
Loaded FSImage in 0 seconds.
2017-07-18 00:55:21,065 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loaded image
for txid 0 from /tmp/hadoop-hdfs/dfs/name/current/fsimage_0000000000000000000
2017-07-18 00:55:21,084 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Need to
save fs image? false (staleImage=false, haEnabled=false, isRollingUpgrade=false)
2017-07-18 00:55:21,084 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log
segment at 1

Thus wrong path in incubator-hawq/contrib/hawq-docker/centos*-docker/hawq-test/start-hdfs.sh
causes namenode to hang during each restart of the containers making hdfs unavailable.

This message was sent by Atlassian JIRA

View raw message