hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Klinger" ...@web-computing.de>
Subject AW: Hadoop 2.6.0 - No DataNode to stop
Date Mon, 02 Mar 2015 12:53:39 GMT
Thanks for your help. But unfortunatly this didn’t do the job. Here’s the Shellscript I’ve
written to start my cluster (the scripts on the other node only contains the command to start
the datanode respectively the command to start the Nodemanager on the other node (with the
right user (hdfs / yarn)):

 

 

#!/bin/bash

# Start HDFS-------------------------------------------------------------------------------------------------------------------------

# Start Namenode

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs
start namenode"

wait

 

# Start all Datanodes

export HADOOP_SECURE_DN_USER=hdfs

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs
start datanode"

wait

ssh root@hadoop-data.klinger.local 'bash startDatanode.sh'

wait

 

# Start Resourcemanager

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager"

wait

 

# Start Nodemanager on all Nodes

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager"

wait

ssh root@hadoop-data.klinger.local 'bash startNodemanager.sh'

wait

 

# Start Proxyserver

#su - yarn -c "$HADOOP_YARN_HOME/bin/yarn start proxyserver --config $HADOOP_CONF_DIR"

#wait

 

# Start Historyserver

su - mapred -c "$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR"

wait

 

This script generates the following output:

 

starting namenode, logging to /var/log/cluster/hadoop/hadoop-hdfs-namenode-hadoop.klinger.local.out

starting datanode, logging to /var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop.klinger.local.out

starting datanode, logging to /var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop-data.klinger.local.out

starting resourcemanager, logging to /var/log/cluster/yarn/yarn-yarn-resourcemanager-hadoop.klinger.local.out

starting nodemanager, logging to /var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop.klinger.local.out

starting nodemanager, logging to /var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop-data.klinger.local.out

starting historyserver, logging to /var/log/cluster/mapred/mapred-mapred-historyserver-hadoop.klinger.local.out

 

Following my stopscript and it’s output:

 

#!/bin/bash

# Stop HDFS------------------------------------------------------------------------------------------------

# Stop Namenode

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs
stop namenode"

 

# Stop all Datanodes

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs
stop datanode"

ssh root@hadoop-data.klinger.local 'bash stopDatanode.sh'

 

# Stop Resourcemanager

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager"

 

#Stop Nodemanager on all Hosts

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager"

ssh root@hadoop-data.klinger.local 'bash stopNodemanager.sh'

 

#Stop Proxyserver

#su - yarn -c "$HADOOP_YARN_HOME/bin/yarn stop proxyserver --config $HADOOP_CONF_DIR"

 

#Stop Historyserver

su - mapred -c "$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR"

 

stopping namenode

no datanode to stop

no datanode to stop

stopping resourcemanager

stopping nodemanager

stopping nodemanager

nodemanager did not stop gracefully after 5 seconds: killing with kill -9

stopping historyserver

 

Is there may be anything wrong with my commands?

 

Greets

DK

 

Von: Varun Kumar [mailto:varun.uid@gmail.com] 
Gesendet: Montag, 2. März 2015 05:28
An: user
Betreff: Re: Hadoop 2.6.0 - No DataNode to stop

 

1.Stop the service 

2.Change the permissions for log and pid directory once again to hdfs.

 

3.Start service with hdfs.

 

This will resolve the issue

 

On Sun, Mar 1, 2015 at 6:40 PM, Daniel Klinger <dk@web-computing.de <mailto:dk@web-computing.de>
> wrote:

Thanks for your answer. 

 

I put the FQDN of the DataNodes in the slaves file on each node (one FQDN per line). Here’s
the full DataNode log after the start (the log of the other DataNode is exactly the same):

 

2015-03-02 00:29:41,841 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX
signal handlers for [TERM, HUP, INT]

2015-03-02 00:29:42,207 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties
from hadoop-metrics2.properties

2015-03-02 00:29:42,312 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled
snapshot period at 10 second(s).

2015-03-02 00:29:42,313 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics
system started

2015-03-02 00:29:42,319 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname
is hadoop.klinger.local

2015-03-02 00:29:42,327 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting DataNode
with maxLockedMemory = 0

2015-03-02 00:29:42,350 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming
server at /0.0.0.0:50010 <http://0.0.0.0:50010> 

2015-03-02 00:29:42,357 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith
is 1048576 bytes/s

2015-03-02 00:29:42,358 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads
for balancing is 5

2015-03-02 00:29:42,458 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log)
via org.mortbay.log.Slf4jLog

2015-03-02 00:29:42,462 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.datanode
is not defined

2015-03-02 00:29:42,474 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety'
(class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)

2015-03-02 00:29:42,476 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode

2015-03-02 00:29:42,476 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs

2015-03-02 00:29:42,476 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static

2015-03-02 00:29:42,494 INFO org.apache.hadoop.http.HttpServer2: addJerseyResourcePackage:
packageName=org.apache.hadoop.hdfs.server.datanode.web.resources;org.apache.hadoop.hdfs.web.resources,
pathSpec=/webhdfs/v1/*

2015-03-02 00:29:42,499 INFO org.mortbay.log: jetty-6.1.26

2015-03-02 00:29:42,555 WARN org.mortbay.log: Can't reuse /tmp/Jetty_0_0_0_0_50075_datanode____hwtdwq,
using /tmp/Jetty_0_0_0_0_50075_datanode____hwtdwq_3168831075162569402

2015-03-02 00:29:43,205 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50075
<http://SelectChannelConnectorWithSafeStartup@0.0.0.0:50075> 

2015-03-02 00:29:43,635 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnUserName =
hdfs

2015-03-02 00:29:43,635 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup =
supergroup

2015-03-02 00:29:43,802 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class
java.util.concurrent.LinkedBlockingQueue

2015-03-02 00:29:43,823 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port
50020

2015-03-02 00:29:43,875 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server
at /0.0.0.0:50020 <http://0.0.0.0:50020> 

2015-03-02 00:29:43,913 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request
received for nameservices: null

2015-03-02 00:29:43,953 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices
for nameservices: <default>

2015-03-02 00:29:43,973 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering>
(Datanode Uuid unassigned) service to hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020>
 starting to offer service

2015-03-02 00:29:43,981 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting

2015-03-02 00:29:43,982 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting

2015-03-02 00:29:44,620 INFO org.apache.hadoop.hdfs.server.common.Storage: DataNode version:
-56 and NameNode layout version: -60

2015-03-02 00:29:44,641 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /cluster/storage/datanode/in_use.lock
acquired by nodename 1660@hadoop.klinger.local <mailto:1660@hadoop.klinger.local> 

2015-03-02 00:29:44,822 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage
directories for bpid BP-158097147-10.0.1.148-1424966425688

2015-03-02 00:29:44,822 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled

2015-03-02 00:29:44,825 INFO org.apache.hadoop.hdfs.server.common.Storage: Restored 0 block
files from trash.

2015-03-02 00:29:44,829 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage:
nsid=330980018;bpid=BP-158097147-10.0.1.148-1424966425688;lv=-56;nsInfo=lv=-60;cid=CID-a2c81934-b3ce-44aa-b920-436ee2f0d5a7;nsid=330980018;c=0;bpid=BP-158097147-10.0.1.148-1424966425688;dnuuid=a3b6c890-41ca-4bde-855c-015c67e6e0df

2015-03-02 00:29:44,996 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Added new volume: /cluster/storage/datanode/current

2015-03-02 00:29:44,998 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Added volume - /cluster/storage/datanode/current, StorageType: DISK

2015-03-02 00:29:45,035 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Registered FSDatasetState MBean

2015-03-02 00:29:45,057 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Periodic
Directory Tree Verification scan starting at 1425265856057 with interval 21600000

2015-03-02 00:29:45,064 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Adding block pool BP-158097147-10.0.1.148-1424966425688

2015-03-02 00:29:45,071 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Scanning block pool BP-158097147-10.0.1.148-1424966425688 on volume /cluster/storage/datanode/current...

2015-03-02 00:29:45,128 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Time taken to scan block pool BP-158097147-10.0.1.148-1424966425688 on /cluster/storage/datanode/current:
56ms

2015-03-02 00:29:45,128 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Total time to scan all replicas for block pool BP-158097147-10.0.1.148-1424966425688: 64ms

2015-03-02 00:29:45,128 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Adding replicas to map for block pool BP-158097147-10.0.1.148-1424966425688 on volume /cluster/storage/datanode/current...

2015-03-02 00:29:45,129 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Time to add replicas to map for block pool BP-158097147-10.0.1.148-1424966425688 on volume
/cluster/storage/datanode/current: 0ms

2015-03-02 00:29:45,134 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Total time to add all replicas to map: 5ms

2015-03-02 00:29:45,138 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-158097147-10.0.1.148-1424966425688
(Datanode Uuid null) service to hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020>
 beginning handshake with NN

2015-03-02 00:29:45,316 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool Block
pool BP-158097147-10.0.1.148-1424966425688 (Datanode Uuid null) service to hadoop.klinger.local/10.0.1.148:8020
<http://10.0.1.148:8020>  successfully registered with NN

2015-03-02 00:29:45,316 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode
hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020>  using DELETEREPORT_INTERVAL
of 300000 msec  BLOCKREPORT_INTERVAL of 21600000msec CACHEREPORT_INTERVAL of 10000msec Initial
delay: 0msec; heartBeatInterval=3000

2015-03-02 00:29:45,751 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Namenode Block
pool BP-158097147-10.0.1.148-1424966425688 (Datanode Uuid a3b6c890-41ca-4bde-855c-015c67e6e0df)
service to hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020>  trying to
claim ACTIVE state with txid=24

2015-03-02 00:29:45,751 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Acknowledging
ACTIVE Namenode Block pool BP-158097147-10.0.1.148-1424966425688 (Datanode Uuid a3b6c890-41ca-4bde-855c-015c67e6e0df)
service to hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020> 

2015-03-02 00:29:45,883 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Sent 1 blockreports
0 blocks total. Took 4 msec to generate and 126 msecs for RPC and NN processing.  Got back
commands org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@3d528774 <mailto:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@3d528774>


2015-03-02 00:29:45,883 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Got finalize
command for block pool BP-158097147-10.0.1.148-1424966425688

2015-03-02 00:29:45,891 INFO org.apache.hadoop.util.GSet: Computing capacity for map BlockMap

2015-03-02 00:29:45,891 INFO org.apache.hadoop.util.GSet: VM type       = 64-bit

2015-03-02 00:29:45,893 INFO org.apache.hadoop.util.GSet: 0.5% max memory 966.7 MB = 4.8 MB

2015-03-02 00:29:45,893 INFO org.apache.hadoop.util.GSet: capacity      = 2^19 = 524288 entries

2015-03-02 00:29:45,894 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner:
Periodic Block Verification Scanner initialized with interval 504 hours for block pool BP-158097147-10.0.1.148-1424966425688

2015-03-02 00:29:45,900 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Added
bpid=BP-158097147-10.0.1.148-1424966425688 to blockPoolScannerMap, new size=1

 

 

Dfsadmin –report (called as user hdfs on NameNode) generated following output. It looks
like both DataNodes are available:

 

Configured Capacity: 985465716736 (917.79 GB)

Present Capacity: 929892360192 (866.03 GB)

DFS Remaining: 929892302848 (866.03 GB)

DFS Used: 57344 (56 KB)

DFS Used%: 0.00%

Under replicated blocks: 0

Blocks with corrupt replicas: 0

Missing blocks: 0

 

-------------------------------------------------

Live datanodes (2):

 

Name: 10.0.1.148:50010 <http://10.0.1.148:50010>  (hadoop.klinger.local)

Hostname: hadoop.klinger.local

Decommission Status : Normal

Configured Capacity: 492732858368 (458.89 GB)

DFS Used: 28672 (28 KB)

Non DFS Used: 27942051840 (26.02 GB)

DFS Remaining: 464790777856 (432.87 GB)

DFS Used%: 0.00%

DFS Remaining%: 94.33%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Mon Mar 02 00:38:00 CET 2015

 

 

Name: 10.0.1.89:50010 <http://10.0.1.89:50010>  (hadoop-data.klinger.local)

Hostname: hadoop-data.klinger.local

Decommission Status : Normal

Configured Capacity: 492732858368 (458.89 GB)

DFS Used: 28672 (28 KB)

Non DFS Used: 27631304704 (25.73 GB)

DFS Remaining: 465101524992 (433.16 GB)

DFS Used%: 0.00%

DFS Remaining%: 94.39%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Mon Mar 02 00:37:59 CET 2015

 

Any further thoughts?

 

Greets

DK

Von: Ulul [mailto:hadoop@ulul.org <mailto:hadoop@ulul.org> ] 
Gesendet: Sonntag, 1. März 2015 13:12


An: user@hadoop.apache.org <mailto:user@hadoop.apache.org> 
Betreff: Re: Hadoop 2.6.0 - No DataNode to stop

 

Hi

Did you check your slaves file is correct ?
That the datanode process is actually running ?
Did you check its log file ?
That the datanode is available ? (dfsadmin -report, through the WUI)

We need more detail

Ulul

Le 28/02/2015 22:05, Daniel Klinger a écrit :

Thanks but i know how to kill a process in Linux. But this didn’t answer the question why
the command say no Datanode to stop instead of stopping the Datanode:
 
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
 
 

 

 

Von: Surbhi Gupta [mailto:surbhi.gupta01@gmail.com] 
Gesendet: Samstag, 28. Februar 2015 20:16
An: user@hadoop.apache.org <mailto:user@hadoop.apache.org> 
Betreff: Re: Hadoop 2.6.0 - No DataNode to stop

 

Issue jps and get the process id or 
Try to get the process id of datanode.

Issue ps-fu userid of the user through which datanode is running.

Then kill the process using kill -9

On 28 Feb 2015 09:38, "Daniel Klinger" <dk@web-computing.de <mailto:dk@web-computing.de>
> wrote:

Hello,

 

I used a lot of Hadoop-Distributions. Now I’m trying to install a pure Hadoop on a little
„cluster“ for testing (2 CentOS-VMs: 1 Name+DataNode 1 DataNode). I followed the instructions
on the Documentation site: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html.

 

I’m starting the Cluster like it is described in the Chapter „Operating the Hadoop Cluster“(with
different users). The starting process works great. The PID-Files are created in /var/run
and u can see that Folders and Files are created in the Data- and NameNode folders. I’m
getting no errors in the log-files.

 

When I try to stop the cluster all Services are stopped (NameNode, ResourceManager etc.).
But when I stop the DataNodes I’m getting the message: „No DataNode to stop“. The PID-File
and the in_use.lock-File are still there and if I try to start the DataNode again I’m getting
the error that the Process is already running. When I stop the DataNode as hdfs instead of
root the PID and in_use-File are removed but I’m still getting the message: „No DataNode
to stop“

 

What I’m doing wrong?

 

Greets

dk

 





 

-- 

Regards,

Varun Kumar.P


Mime
View raw message