hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: datanode goes down, maybe due to Unexpected problem in creating temporary file
Date Tue, 18 May 2010 01:19:30 GMT
[sjc1-qa-certiq1.sjc1:hadoop 25763]df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                     303346048 281730720   6206176  98% /

In hadoop-hadoop-jobtracker- log repeated for over 50 minutes:
2010-05-18 00:57:38,451 INFO org.apache.hadoop.mapred.JobTracker: Cleaning
up the system directory
2010-05-18 00:57:38,452 INFO org.apache.hadoop.mapred.JobTracker: problem
cleaning system directory: hdfs://
sjc1-qa-certiq1.sjc1.carrieriq.com:9000/tmp/hadoop/mapred/system
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete
/tmp/hadoop/mapred/system. Name node is in safe mode.
The reported blocks 793 needs additional 2 blocks to reach the threshold
0.9990 of total blocks 796. Safe mode will be turned off automatically.
    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1711)
    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1691)
    at
org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:565)
    at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)

    at org.apache.hadoop.ipc.Client.call(Client.java:740)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy4.delete(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy4.delete(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:586)
    at
org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:227)
    at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1710)
    at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:188)
    at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:180)
    at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3735)

In datanode log two entries were written after 793 blocks getting processed:
2010-05-17 16:56:22,135 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_5132109619501575576_2882
2010-05-17 17:06:22,245 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_-4733675789688469676_1685
2010-05-17 17:16:22,383 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_5502091787302369348_2833
2010-05-17 17:26:22,507 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_-6935269688635441525_1778
2010-05-17 17:36:22,709 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_8901766802925964345_1804
2010-05-17 17:41:43,929 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: *BlockReport of 793 blocks
got processed in 107 msecs*
2010-05-17 17:46:23,065 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_-8536166246203803199_2384
2010-05-17 17:56:22,445 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_3069643184898592926_1667

I wonder what prevented from the last two blocks from being recognized by
Namenode ?

Here is the log from the other datanode:

2010-05-17 14:38:10,838 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.32.56.170:50010,
storageID=DS-1307210060-10.32.56.170-50010-1274132290796, infoPort=50075,
ipcPort=50020)In DataNode.run, data =
FSDataset{dirpath='/home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/current'}
2010-05-17 14:38:10,839 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL
of 3600000msec Initial delay: 0msec
2010-05-17 14:38:10,887 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got
processed in 5 msecs
2010-05-17 14:38:10,888 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic block
scanner.
2010-05-17 15:15:51,037 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got
processed in 3 msecs

Currently our QA testing is blocked.

Can someone provide help ?

Thanks

On Mon, May 17, 2010 at 12:40 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> That blk doesn't appear in NameNode log.
>
> For datanode,
> 2010-05-15 00:09:31,023 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_926027507678171558_3620 src: /10.32.56.170:49172 dest: /
> 10.32.56.171:50010
> 2010-05-15 00:09:31,024 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_926027507678171558_3620 received exception java.io.IOException:
> Unexpected problem in creating temporary file for
> blk_926027507678171558_3620.  File
> /home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/tmp/blk_926027507678171558
> should not be present, but is.
> 2010-05-15 00:09:31,024 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_-5814095875968936685_2910 received exception java.io.IOException:
> Unexpected problem in creating temporary file for
> blk_-5814095875968936685_2910.  File
> /home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/tmp/blk_-5814095875968936685
> should not be present, but is.
> 2010-05-15 00:09:31,025 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.32.56.171:50010,
> storageID=DS-1723593983-10.32.56.171-50010-1273792791835, infoPort=50075,
> ipcPort=50020):DataXceiver
>
> java.io.IOException: Unexpected problem in creating temporary file for
> blk_926027507678171558_3620.  File
> /home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/tmp/blk_926027507678171558
> should not be present, but is.
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:398)
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:376)
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.createTmpFile(FSDataset.java:1133)
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1022)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:98)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:259)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
>         at java.lang.Thread.run(Thread.java:619)
> 2010-05-15 00:09:31,025 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.32.56.171:50010,
> storageID=DS-1723593983-10.32.56.171-50010-1273792791835, infoPort=50075,
> ipcPort=50020):DataXceiver
>
> 2010-05-15 00:19:28,334 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_926027507678171558_3620 src: /10.32.56.170:36887 dest: /
> 10.32.56.171:50010
> 2010-05-15 00:19:28,334 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_926027507678171558_3620 received exception java.io.IOException:
> Unexpected problem in creating temporary file for
> blk_926027507678171558_3620.  File
> /home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/tmp/blk_926027507678171558
> should not be present, but is.
> 2010-05-15 00:19:28,334 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.32.56.171:50010,
> storageID=DS-1723593983-10.32.56.171-50010-1273792791835, infoPort=50075,
> ipcPort=50020):DataXceiver
>
> java.io.IOException: Unexpected problem in creating temporary file for
> blk_926027507678171558_3620.  File
> /home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/tmp/blk_926027507678171558
> should not be present, but is.
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:398)
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:376)
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.createTmpFile(FSDataset.java:1133)
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1022)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:98)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:259)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
>         at java.lang.Thread.run(Thread.java:619)
> 2010-05-15 00:29:25,635 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_926027507678171558_3620 src: /10.32.56.170:34823 dest: /
> 10.32.56.171:50010
>
>
> On Mon, May 17, 2010 at 11:43 AM, Todd Lipcon <todd@cloudera.com> wrote:
>
>> Hi Ted,
>>
>> Can you please grep your NN and DN logs for blk_926027507678171558 and
>> pastebin the results?
>>
>> -Todd
>>
>> On Mon, May 17, 2010 at 9:57 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>> > Hi,
>> > We use CDH2 hadoop-0.20.2+228 which crashed on datanode smsrv10.ciq.com
>> >
>> > I found this in datanode log:
>> >
>> > 2010-05-15 07:37:35,955 INFO
>> > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>> > blk_926027507678171558_3620 src: /10.32.56.170:53378 dest: /
>> > 10.32.56.171:50010
>> > 2010-05-15 07:37:35,956 INFO
>> > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>> > blk_926027507678171558_3620 received exception java.io.IOException:
>> > Unexpected problem in creating temporary file for
>> > blk_926027507678171558_3620.  File
>> >
>> >
>> /home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/tmp/blk_926027507678171558
>> > should not be present, but is.
>> > 2010-05-15 07:37:35,956 ERROR
>> > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> > 10.32.56.171:50010,
>> > storageID=DS-1723593983-10.32.56.171-50010-1273792791835,
>> infoPort=50075,
>> > ipcPort=50020):DataXceiver
>> > java.io.IOException: Unexpected problem in creating temporary file for
>> > blk_926027507678171558_3620.  File
>> >
>> >
>> /home/hadoop/m2m_3.0.x/3.0.trunk.39-270238/data/hadoop-data/dfs/data/tmp/blk_926027507678171558
>> > should not be present, but is.
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:398)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:376)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.FSDataset.createTmpFile(FSDataset.java:1133)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1022)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:98)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:259)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
>> >        at java.lang.Thread.run(Thread.java:619)
>> >
>> > Can someone provide some clue ?
>> >
>> > Thanks
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message