hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igor Bolotin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5523) Datanode stops cleaning disk space
Date Wed, 18 Mar 2009 05:10:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682916#action_12682916
] 

Igor Bolotin commented on HADOOP-5523:
--------------------------------------

DF and DU sizes on the datanode match very closely with information reported by dfsadmin command.

Lsof reports some 1000 open files in DFS data directories on the problematic datanode, but
total size for open files is only about 10GB.

Here is something interesting - fsck before datanode restart reports very significant number
of over-replicated blocks (~10% of blocks are over-replicated):

Status: HEALTHY
 Total size:    1472758591906 B (Total open files size: 29050588133 B)                   
                                  
 Total dirs:    58431                                                                    
                                  
 Total files:   375703 (Files currently being written: 418)                              
                                  
 Total blocks (validated):      387205 (avg. block size 3803562 B) (Total open file blocks
(not validated): 595)            
 Minimally replicated blocks:   387205 (100.0 %)                                         
                                  
 Over-replicated blocks:        38782 (10.015883 %)                                      
                                  
 Under-replicated blocks:       0 (0.0 %)                                                
                                  
 Mis-replicated blocks:         0 (0.0 %)                                                
                                  
 Default replication factor:    3                                                        
                                  
 Average block replication:     3.1003888                                                
                                  
 Corrupt blocks:                0                                                        
                                  
 Missing replicas:              0 (0.0 %)                                                
                                  
 Number of data-nodes:          7                                                        
                                  
 Number of racks:               1                                                        
                                  

After datanode restart - over-replicated nodes are practically gone:

Status: HEALTHY
 Total size:    1310669475298 B (Total open files size: 29535016933 B)
 Total dirs:    59431
 Total files:   377177 (Files currently being written: 387)
 Total blocks (validated):      386661 (avg. block size 3389712 B) (Total open file blocks
(not validated): 607)
 Minimally replicated blocks:   386661 (100.0 %)
 Over-replicated blocks:        272 (0.070345856 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0007036
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          7
 Number of racks:               1


> Datanode stops cleaning disk space
> ----------------------------------
>
>                 Key: HADOOP-5523
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5523
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.0
>         Environment: Linux
>            Reporter: Igor Bolotin
>            Priority: Critical
>
> Here is the situation - DFS cluster running Hadoop version 0.19.0. The cluster is running
on multiple servers with practically identical hardware. 
> Everything works perfectly well, except for one thing - from time to time one of the
data nodes (every time it's a different node) starts to consume more and more disk space.
The node keeps going and if we don't do anything - it runs out of space completely (ignoring
20GB reserved space settings). 
> Once restarted - it cleans disk rapidly and goes back to approximately the same utilization
as the rest of data nodes in the cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message