hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From C G <parallel...@yahoo.com>
Subject Re: 0.16.4 DFS dropping blocks, then won't retart...
Date Fri, 23 May 2008 16:01:29 GMT
2008-05-23 11:53:25,377 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = primary/10.2.13.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.16.4-dev
STARTUP_MSG:   build = svn+ssh://cgillett@svn.visiblemeasures.com/srv/svn/repositories/svnvmc/overdrive/trunk/hadoop-0.16.4
-r 2182; compiled
 by 'cg' on Mon May 19 17:47:05 EDT 2008
************************************************************/
2008-05-23 11:53:26,107 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics
with hostName=NameNode, port=54310
2008-05-23 11:53:26,136 INFO org.apache.hadoop.dfs.NameNode: Namenode up at: overdrive1-node-primary/10.2.13.1:54310
2008-05-23 11:53:26,146 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics
with processName=NameNode, sessionId=null
2008-05-23 11:53:26,149 INFO org.apache.hadoop.dfs.NameNodeMetrics: Initializing NameNodeMeterics
using context object:org.apache.hadoop.metr
ics.spi.NullContext
2008-05-23 11:53:26,463 INFO org.apache.hadoop.fs.FSNamesystem: fsOwner=cg,cg
2008-05-23 11:53:26,463 INFO org.apache.hadoop.fs.FSNamesystem: supergroup=supergroup
2008-05-23 11:53:26,463 INFO org.apache.hadoop.fs.FSNamesystem: isPermissionEnabled=true
2008-05-23 11:53:36,064 INFO org.apache.hadoop.fs.FSNamesystem: Finished loading FSImage in
9788 msecs
2008-05-23 11:53:36,079 INFO org.apache.hadoop.dfs.StateChange: STATE* SafeModeInfo.enter:
Safe mode is ON.
Safe mode will be turned off automatically.
2008-05-23 11:53:36,115 INFO org.apache.hadoop.fs.FSNamesystem: Registered FSNamesystemStatusMBean
2008-05-23 11:53:36,339 INFO org.mortbay.util.Credential: Checking Resource aliases
2008-05-23 11:53:36,410 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
2008-05-23 11:53:36,410 INFO org.mortbay.util.Container: Started HttpContext[/static,/static]
2008-05-23 11:53:36,410 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs]
2008-05-23 11:53:36,752 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.WebApplicationHandler@b1a4e2
2008-05-23 11:53:36,925 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/]
2008-05-23 11:53:36,926 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50070
2008-05-23 11:53:36,926 INFO org.mortbay.util.Container: Started org.mortbay.jetty.Server@8c4a77
2008-05-23 11:53:36,926 INFO org.apache.hadoop.fs.FSNamesystem: Web-server up at: 0.0.0.0:50070
2008-05-23 11:53:36,927 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2008-05-23 11:53:36,927 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54310: starting
2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 54310:
starting
2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54310:
starting
2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54310:
starting
2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 54310:
starting
2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 54310:
starting
2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 54310:
starting
2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 54310:
starting
2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 54310:
starting
2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 54310:
starting
2008-05-23 11:53:36,940 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 54310:
starting
  2008-05-23 11:53:37,096 INFO org.apache.hadoop.dfs.NameNode: Error report from worker9:50010:
Incompatible build versions: na
menode BV = 2182; datanode BV = 652614
2008-05-23 11:53:37,097 INFO org.apache.hadoop.dfs.NameNode: Error report from worker12:50010:
Incompatible build versions: n
amenode BV = 2182; datanode BV = 652614
  [error above repeated for all nodes in system]
2008-05-23 11:53:42,082 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.registerDatanode:
node registration from 10.2.13.1:50010 st
orage DS-1855907496-10.2.13.1-50010-1198767012191
2008-05-23 11:53:42,094 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/10.2.13.1:50010

  Oddly enough, the DataNode associated with the master node is up and running:
   
  2008-05-23 11:53:25,380 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = primary/10.2.13.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.16.4-dev
STARTUP_MSG:   build = svn+ssh://cgillett@svn.visiblemeasures.com/srv/svn/repositories/svnvmc/overdrive/trunk/hadoop-0.16.4
-r 2182; compiled
 by 'cg' on Mon May 19 17:47:05 EDT 2008
************************************************************/
2008-05-23 11:53:40,786 INFO org.apache.hadoop.dfs.DataNode: Registered FSDatasetStatusMBean
2008-05-23 11:53:40,786 INFO org.apache.hadoop.dfs.DataNode: Opened server at 50010
2008-05-23 11:53:40,793 INFO org.apache.hadoop.dfs.DataNode: Balancing bandwith is 1048576
bytes/s
2008-05-23 11:53:41,838 INFO org.mortbay.util.Credential: Checking Resource aliases
2008-05-23 11:53:41,868 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
2008-05-23 11:53:41,869 INFO org.mortbay.util.Container: Started HttpContext[/static,/static]
2008-05-23 11:53:41,869 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs]
2008-05-23 11:53:42,051 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.WebApplicationHandler@1367e28
2008-05-23 11:53:42,079 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/]
2008-05-23 11:53:42,081 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50075
2008-05-23 11:53:42,081 INFO org.mortbay.util.Container: Started org.mortbay.jetty.Server@17a4989
2008-05-23 11:53:42,101 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics
with processName=DataNode, sessionId=null
2008-05-23 11:53:42,120 INFO org.apache.hadoop.dfs.DataNode: 10.2.13.1:50010In DataNode.run,
data = FSDataset{dirpath='/data/HDFS/data/curren
t'}
2008-05-23 11:53:42,121 INFO org.apache.hadoop.dfs.DataNode: using BLOCKREPORT_INTERVAL of
3368704msec Initial delay: 60000msec
2008-05-23 11:53:46,169 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 66383 blocks got
processed in 3027 msecs
2008-05-23 11:54:47,033 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_672882539393226281
2008-05-23 11:54:47,070 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_4933623861101284298
2008-05-23 11:54:51,834 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-4096375515223627412
2008-05-23 11:54:52,834 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-4329313062145243554
2008-05-23 11:54:52,869 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_5951529374648563965
2008-05-23 11:54:53,033 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_5526809302368511891
2008-05-23 11:55:07,101 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_3384706504442100270
2008-05-23 11:56:23,966 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-8100668927196678325
2008-05-23 11:56:24,165 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-9045089577001067802
2008-05-23 11:56:53,365 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-5156742068519955681
2008-05-23 11:56:53,375 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_8099933609289941991
2008-05-23 11:56:57,164 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-519952963742834206
2008-05-23 11:56:57,565 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-7514486773323267604
2008-05-23 11:56:59,366 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_5706035426017364787
2008-05-23 11:56:59,398 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-8163260915256505245
2008-05-23 11:57:08,455 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-3800057016159468929
2008-05-23 11:57:27,159 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-1945776220462007170
2008-05-23 11:57:41,058 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_1059797111434771921
2008-05-23 11:57:50,107 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_335910613100888045
2008-05-23 11:58:04,999 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-758702836140613218
2008-05-23 11:58:17,060 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_5680261036802662113
2008-05-23 11:58:31,128 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_6577967380328271133
2008-05-23 11:58:45,185 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-7268945479231310134
2008-05-23 11:58:59,450 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_5582966652198891861
2008-05-23 11:59:14,499 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-8204668722708860846

Raghu Angadi <rangadi@yahoo-inc.com> wrote:
  
Can you attach initialization part of NameNode?

thanks,
Raghu.

C G wrote:
> We've recently upgraded from 0.15.0 to 0.16.4. Two nights ago we had a problem where
DFS nodes could not communicate. After not finding anything obviously wrong we decided to
shut down DFS and restart. Following restart I was seeing a corrupted system with significant
amounts of missing data. Further checking showed that DataNodes on all slaves did not start
due to what looks like a version skew issue.
> 
> Our distribution is a straight 0.16.4 dist, so I'm having difficulty understanding what's
causing this issue.
> 
> Note that we haven't finalized the upgrade yet. 
> 
> Any help understanding this problem would be very much appreciated. We have several TB
of data in our system and reloading from scratch would be a big problem.
> 
> Here is the log from one of the failed nodes:
> 
> /************************************************************
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG: host = worker9/10.2.0.9
> STARTUP_MSG: args = []
> STARTUP_MSG: version = 0.16.4
> STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16
-r 652614; compiled by 'hadoopqa' on Fri May 2 00:18:12 UTC 2008
> ************************************************************/
> 2008-05-23 08:10:47,196 FATAL org.apache.hadoop.dfs.DataNode: Incompatible build versions:
namenode BV = 2182; datanode BV = 652614
> 2008-05-23 08:10:47,202 ERROR org.apache.hadoop.dfs.DataNode: java.io.IOException: Incompatible
build versions: namenode BV = 2182; datanode BV = 652614
> at org.apache.hadoop.dfs.DataNode.handshake(DataNode.java:342)
> at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:213)
> at org.apache.hadoop.dfs.DataNode.(DataNode.java:162)
> at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:2512)
> at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2456)
> at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2477)
> at org.apache.hadoop.dfs.DataNode.main(DataNode.java:2673)
> 2008-05-23 08:10:47,203 INFO org.apache.hadoop.dfs.DataNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down DataNode at worker9/10.2.0.9
> ************************************************************/
> 
> 



       
       
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message