hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Kline <ch...@rapleaf.com>
Subject Re: Under replicated block doesn't get fixed until DFS restart
Date Tue, 08 Jan 2008 06:03:59 GMT
Sure.  Here are 2 blocks.

----- blk_4522585614366970680

hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-02:2008-01-02  
17:06:17,058 INFO org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.allocateBlock: /data/hbase1/hregion_70236052/ 
compaction.dir/hregion_70236052/info/mapfiles/-1/index.  
blk_4522585614366970680
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-02:2008-01-02  
17:06:17,062 INFO org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.addStoredBlock: blockMap updated: 10.100.11.31:7277 is  
added to blk_4522585614366970680

------ Note that the following is AFTER I restarted HDFS -------

hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:15,882 DEBUG org.apache.hadoop.dfs.StateChange:  
UnderReplicationBlocks.update blk_4522585614366970680 curReplicas 1  
curExpectedReplicas 3 oldReplicas 0 oldExpectedReplicas  3 curPri  0  
oldPri  3
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:15,882 DEBUG org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.UnderReplicationBlock.update:blk_4522585614366970680 has  
only 1 replicas and need 3 replicas so is added to neededReplications  
at priority level 0
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:58,808 DEBUG org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.UnderReplicationBlock.remove: Removing block  
blk_4522585614366970680 from priority queue 0
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:58,808 DEBUG org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.pendingTransfer: blk_4522585614366970680 is removed from  
neededReplications to pendingReplications
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:58,808 INFO org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.pendingTransfer: ask 10.100.11.31:7277 to replicate  
blk_4522585614366970680 to datanode(s) 10.100.11.59:7277  
10.100.11.64:7277
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:59,716 DEBUG org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.blockReceived: blk_4522585614366970680 is received from  
10.100.11.59:7277
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:59,716 INFO org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.addStoredBlock: blockMap updated: 10.100.11.59:7277 is  
added to blk_4522585614366970680
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:59,717 DEBUG org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.blockReceived: blk_4522585614366970680 is received from  
10.100.11.64:7277
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:59,717 INFO org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.addStoredBlock: blockMap updated: 10.100.11.64:7277 is  
added to blk_4522585614366970680

------ blk_1984980330938654629

hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-02:2008-01-02  
17:11:17,231 INFO org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.allocateBlock: /data/hbase1/hregion_70236052/ 
compaction.dir/hregion_70236052/info/done. blk_1984980330938654629
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-02:2008-01-02  
17:11:17,234 INFO org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.addStoredBlock: blockMap updated: 10.100.11.31:7277 is  
added to blk_1984980330938654629

------ Note that the following is AFTER I restarted HDFS -------

hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:15,856 DEBUG org.apache.hadoop.dfs.StateChange:  
UnderReplicationBlocks.update blk_1984980330938654629 curReplicas 1  
curExpectedReplicas 3 oldReplicas 0 oldExpectedReplicas  3 curPri  0  
oldPri  3
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:15,856 DEBUG org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.UnderReplicationBlock.update:blk_1984980330938654629 has  
only 1 replicas and need 3 replicas so is added to neededReplications  
at priority level 0
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:55,805 DEBUG org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.UnderReplicationBlock.remove: Removing block  
blk_1984980330938654629 from priority queue 0
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:55,805 DEBUG org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.pendingTransfer: blk_1984980330938654629 is removed from  
neededReplications to pendingReplications
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:55,805 INFO org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.pendingTransfer: ask 10.100.11.31:7277 to replicate  
blk_1984980330938654629 to datanode(s) 10.100.11.62:7277  
10.100.11.61:7277
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:56,660 DEBUG org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.blockReceived: blk_1984980330938654629 is received from  
10.100.11.62:7277
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:56,660 INFO org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.addStoredBlock: blockMap updated: 10.100.11.62:7277 is  
added to blk_1984980330938654629
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:56,663 DEBUG org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.blockReceived: blk_1984980330938654629 is received from  
10.100.11.61:7277
hadoop-rapleaf-namenode-tf1.rapleaf.com.log.2008-01-03:2008-01-03  
16:17:56,663 INFO org.apache.hadoop.dfs.StateChange: BLOCK*  
NameSystem.addStoredBlock: blockMap updated: 10.100.11.61:7277 is  
added to blk_1984980330938654629

On Jan 7, 2008, at 12:32 PM, Raghu Angadi wrote:

>
> Could you grep for one of the block ids (say 4522585614366970680)  
> in the  namenode logs and post them here with timestamps?
>
> thanks,
> Raghu.
>
> Chris Kline wrote:
>> I believe there was at least one good block (see fsck output).   
>> All data nodes were up at the time according to the web page.  I  
>> grep'd the namenode log files for the under replicated blocks and  
>> only got an entry for when it was created and entries for when the  
>> replication was fixed after the HDFS restart.  Here is the result  
>> of fsck:
>> $HADOOP_HOME/bin/hadoop fsck /
>> .......................................................
>> /data/hbase1/hregion_70236052/compaction.dir/hregion_70236052/info/ 
>> done:  Under replicated blk_1984980330938654629. Target Replicas  
>> is 3 but found 1 replica(s).
>> ..
>> /data/hbase1/hregion_70236052/info/info/2807320534360768620:   
>> Under replicated blk_1717622121416314549. Target Replicas is 3 but  
>> found 1 replica(s).
>> .
>> /data/hbase1/hregion_70236052/info/mapfiles/2807320534360768620/ 
>> data:  Under replicated blk_-5019714262388221150. Target Replicas  
>> is 3 but found 1 replica(s).
>> .
>> /data/hbase1/hregion_70236052/info/mapfiles/2807320534360768620/ 
>> index:  Under replicated blk_4522585614366970680. Target Replicas  
>> is 3 but found 1 replica(s).
>> .........................................
>> /data/hbase1/log_10.100.11.63_1199307142676_60020/hlog.dat.000:   
>> Under replicated blk_-2871471426720379908. Target Replicas is 3  
>> but found 1 replica(s).
>> .......
>> /data/hbase1/log_10.100.11.65_1199307142711_60020/hlog.dat.000:  
>> MISSING 1 blocks of total size 0 B.
>> .Status: CORRUPT
>>  Total size:    71009158262 B
>>  Total blocks:  16318 (avg. block size 4351584 B)
>>  Total dirs:    21416
>>  Total files:   16253
>>  Over-replicated blocks:        0 (0.0 %)
>>  Under-replicated blocks:       5 (0.03064101 %)
>>  Target replication factor:     3
>>  Real replication factor:       2.9993873
>> The filesystem under path '/' is CORRUPT
>> -Chris
>> On Jan 4, 2008, at 1:02 PM, Raghu Angadi wrote:
>>> This is of course not expected. A more detailed info or log  
>>> message would help. Do you know if there is at least one good  
>>> block? Sometimes, the remaining "good" block might actually be  
>>> corrupted and thus can not replicate itself. Restarting might  
>>> just have brought up the datanodes that were down (for whatever  
>>> reason) before the restart.
>>>
>>> Raghu.
>>>
>>> Chris Kline wrote:
>>>> fsck reports several under replicated blocks, but these do not  
>>>> get fixed until I restart DFS.  fsck also reports a missing  
>>>> block at the same time, but this should affect the function of  
>>>> fixing under replicated blocks.  Has anyone seen this before?
>>>> I'm running 0.15.0.
>>>> -Chris Kline
>>>
>> -Chris
>> We're hiring engineers.  $10,007 reward for referrals we hire.
>

-Chris

We're hiring engineers.  $10,007 reward for referrals we hire.



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message