hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhruba Borthakur" <dhr...@gmail.com>
Subject Re: 0.16.4 DFS dropping blocks, then won't retart...
Date Fri, 23 May 2008 17:25:28 GMT
If you look at the log message starting with "STARTUP_MSG:   build
=..." you will see that the namenode and good datanode was built by CG
whereas the bad datanodes were compiled by hadoopqa!

thanks,
dhruba

On Fri, May 23, 2008 at 9:01 AM, C G <parallelguy@yahoo.com> wrote:
> 2008-05-23 11:53:25,377 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = primary/10.2.13.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.16.4-dev
> STARTUP_MSG:   build = svn+ssh://cgillett@svn.visiblemeasures.com/srv/svn/repositories/svnvmc/overdrive/trunk/hadoop-0.16.4
-r 2182; compiled
>  by 'cg' on Mon May 19 17:47:05 EDT 2008
> ************************************************************/
> 2008-05-23 11:53:26,107 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC
Metrics with hostName=NameNode, port=54310
> 2008-05-23 11:53:26,136 INFO org.apache.hadoop.dfs.NameNode: Namenode up at: overdrive1-node-primary/10.2.13.1:54310
> 2008-05-23 11:53:26,146 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM
Metrics with processName=NameNode, sessionId=null
> 2008-05-23 11:53:26,149 INFO org.apache.hadoop.dfs.NameNodeMetrics: Initializing NameNodeMeterics
using context object:org.apache.hadoop.metr
> ics.spi.NullContext
> 2008-05-23 11:53:26,463 INFO org.apache.hadoop.fs.FSNamesystem: fsOwner=cg,cg
> 2008-05-23 11:53:26,463 INFO org.apache.hadoop.fs.FSNamesystem: supergroup=supergroup
> 2008-05-23 11:53:26,463 INFO org.apache.hadoop.fs.FSNamesystem: isPermissionEnabled=true
> 2008-05-23 11:53:36,064 INFO org.apache.hadoop.fs.FSNamesystem: Finished loading FSImage
in 9788 msecs
> 2008-05-23 11:53:36,079 INFO org.apache.hadoop.dfs.StateChange: STATE* SafeModeInfo.enter:
Safe mode is ON.
> Safe mode will be turned off automatically.
> 2008-05-23 11:53:36,115 INFO org.apache.hadoop.fs.FSNamesystem: Registered FSNamesystemStatusMBean
> 2008-05-23 11:53:36,339 INFO org.mortbay.util.Credential: Checking Resource aliases
> 2008-05-23 11:53:36,410 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
> 2008-05-23 11:53:36,410 INFO org.mortbay.util.Container: Started HttpContext[/static,/static]
> 2008-05-23 11:53:36,410 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs]
> 2008-05-23 11:53:36,752 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.WebApplicationHandler@b1a4e2
> 2008-05-23 11:53:36,925 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/]
> 2008-05-23 11:53:36,926 INFO org.mortbay.http.SocketListener: Started SocketListener
on 0.0.0.0:50070
> 2008-05-23 11:53:36,926 INFO org.mortbay.util.Container: Started org.mortbay.jetty.Server@8c4a77
> 2008-05-23 11:53:36,926 INFO org.apache.hadoop.fs.FSNamesystem: Web-server up at: 0.0.0.0:50070
> 2008-05-23 11:53:36,927 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
> 2008-05-23 11:53:36,927 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54310:
starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 54310:
starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54310:
starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54310:
starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 54310:
starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 54310:
starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 54310:
starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 54310:
starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 54310:
starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 54310:
starting
> 2008-05-23 11:53:36,940 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 54310:
starting
>  2008-05-23 11:53:37,096 INFO org.apache.hadoop.dfs.NameNode: Error report from worker9:50010:
Incompatible build versions: na
> menode BV = 2182; datanode BV = 652614
> 2008-05-23 11:53:37,097 INFO org.apache.hadoop.dfs.NameNode: Error report from worker12:50010:
Incompatible build versions: n
> amenode BV = 2182; datanode BV = 652614
>  [error above repeated for all nodes in system]
> 2008-05-23 11:53:42,082 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.registerDatanode:
node registration from 10.2.13.1:50010 st
> orage DS-1855907496-10.2.13.1-50010-1198767012191
> 2008-05-23 11:53:42,094 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node:
/default-rack/10.2.13.1:50010
>
>  Oddly enough, the DataNode associated with the master node is up and running:
>
>  2008-05-23 11:53:25,380 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = primary/10.2.13.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.16.4-dev
> STARTUP_MSG:   build = svn+ssh://cgillett@svn.visiblemeasures.com/srv/svn/repositories/svnvmc/overdrive/trunk/hadoop-0.16.4
-r 2182; compiled
>  by 'cg' on Mon May 19 17:47:05 EDT 2008
> ************************************************************/
> 2008-05-23 11:53:40,786 INFO org.apache.hadoop.dfs.DataNode: Registered FSDatasetStatusMBean
> 2008-05-23 11:53:40,786 INFO org.apache.hadoop.dfs.DataNode: Opened server at 50010
> 2008-05-23 11:53:40,793 INFO org.apache.hadoop.dfs.DataNode: Balancing bandwith is 1048576
bytes/s
> 2008-05-23 11:53:41,838 INFO org.mortbay.util.Credential: Checking Resource aliases
> 2008-05-23 11:53:41,868 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
> 2008-05-23 11:53:41,869 INFO org.mortbay.util.Container: Started HttpContext[/static,/static]
> 2008-05-23 11:53:41,869 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs]
> 2008-05-23 11:53:42,051 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.WebApplicationHandler@1367e28
> 2008-05-23 11:53:42,079 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/]
> 2008-05-23 11:53:42,081 INFO org.mortbay.http.SocketListener: Started SocketListener
on 0.0.0.0:50075
> 2008-05-23 11:53:42,081 INFO org.mortbay.util.Container: Started org.mortbay.jetty.Server@17a4989
> 2008-05-23 11:53:42,101 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM
Metrics with processName=DataNode, sessionId=null
> 2008-05-23 11:53:42,120 INFO org.apache.hadoop.dfs.DataNode: 10.2.13.1:50010In DataNode.run,
data = FSDataset{dirpath='/data/HDFS/data/curren
> t'}
> 2008-05-23 11:53:42,121 INFO org.apache.hadoop.dfs.DataNode: using BLOCKREPORT_INTERVAL
of 3368704msec Initial delay: 60000msec
> 2008-05-23 11:53:46,169 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 66383 blocks
got processed in 3027 msecs
> 2008-05-23 11:54:47,033 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_672882539393226281
> 2008-05-23 11:54:47,070 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_4933623861101284298
> 2008-05-23 11:54:51,834 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-4096375515223627412
> 2008-05-23 11:54:52,834 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-4329313062145243554
> 2008-05-23 11:54:52,869 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_5951529374648563965
> 2008-05-23 11:54:53,033 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_5526809302368511891
> 2008-05-23 11:55:07,101 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_3384706504442100270
> 2008-05-23 11:56:23,966 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-8100668927196678325
> 2008-05-23 11:56:24,165 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-9045089577001067802
> 2008-05-23 11:56:53,365 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-5156742068519955681
> 2008-05-23 11:56:53,375 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_8099933609289941991
> 2008-05-23 11:56:57,164 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-519952963742834206
> 2008-05-23 11:56:57,565 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-7514486773323267604
> 2008-05-23 11:56:59,366 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_5706035426017364787
> 2008-05-23 11:56:59,398 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-8163260915256505245
> 2008-05-23 11:57:08,455 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-3800057016159468929
> 2008-05-23 11:57:27,159 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-1945776220462007170
> 2008-05-23 11:57:41,058 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_1059797111434771921
> 2008-05-23 11:57:50,107 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_335910613100888045
> 2008-05-23 11:58:04,999 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-758702836140613218
> 2008-05-23 11:58:17,060 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_5680261036802662113
> 2008-05-23 11:58:31,128 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_6577967380328271133
> 2008-05-23 11:58:45,185 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-7268945479231310134
> 2008-05-23 11:58:59,450 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_5582966652198891861
> 2008-05-23 11:59:14,499 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded
for blk_-8204668722708860846
>
> Raghu Angadi <rangadi@yahoo-inc.com> wrote:
>
> Can you attach initialization part of NameNode?
>
> thanks,
> Raghu.
>
> C G wrote:
>> We've recently upgraded from 0.15.0 to 0.16.4. Two nights ago we had a problem where
DFS nodes could not communicate. After not finding anything obviously wrong we decided to
shut down DFS and restart. Following restart I was seeing a corrupted system with significant
amounts of missing data. Further checking showed that DataNodes on all slaves did not start
due to what looks like a version skew issue.
>>
>> Our distribution is a straight 0.16.4 dist, so I'm having difficulty understanding
what's causing this issue.
>>
>> Note that we haven't finalized the upgrade yet.
>>
>> Any help understanding this problem would be very much appreciated. We have several
TB of data in our system and reloading from scratch would be a big problem.
>>
>> Here is the log from one of the failed nodes:
>>
>> /************************************************************
>> STARTUP_MSG: Starting DataNode
>> STARTUP_MSG: host = worker9/10.2.0.9
>> STARTUP_MSG: args = []
>> STARTUP_MSG: version = 0.16.4
>> STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16
-r 652614; compiled by 'hadoopqa' on Fri May 2 00:18:12 UTC 2008
>> ************************************************************/
>> 2008-05-23 08:10:47,196 FATAL org.apache.hadoop.dfs.DataNode: Incompatible build
versions: namenode BV = 2182; datanode BV = 652614
>> 2008-05-23 08:10:47,202 ERROR org.apache.hadoop.dfs.DataNode: java.io.IOException:
Incompatible build versions: namenode BV = 2182; datanode BV = 652614
>> at org.apache.hadoop.dfs.DataNode.handshake(DataNode.java:342)
>> at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:213)
>> at org.apache.hadoop.dfs.DataNode.(DataNode.java:162)
>> at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:2512)
>> at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2456)
>> at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2477)
>> at org.apache.hadoop.dfs.DataNode.main(DataNode.java:2673)
>> 2008-05-23 08:10:47,203 INFO org.apache.hadoop.dfs.DataNode: SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down DataNode at worker9/10.2.0.9
>> ************************************************************/
>>
>>
>
>
>
>
>

Mime
View raw message