hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yimeng (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-9157) Failed deletion dirs in yarn.nodemanager.local-dirs causes accumulation lots of files under the path yarn.nodemanager.local-dirs and causes operation systerm's Inode to to be depleted
Date Tue, 25 Dec 2018 02:13:00 GMT
yimeng created YARN-9157:
----------------------------

             Summary: Failed deletion dirs in  yarn.nodemanager.local-dirs  causes accumulation
lots of files under the path yarn.nodemanager.local-dirs and  causes operation systerm's Inode
to to be depleted
                 Key: YARN-9157
                 URL: https://issues.apache.org/jira/browse/YARN-9157
             Project: Hadoop YARN
          Issue Type: Bug
          Components: applications/distributed-shell
    Affects Versions: 3.1.1, 3.0.1, 2.7.5, 2.8.3, 2.7.2
            Reporter: yimeng
         Attachments: image-2018-12-25-10-03-51-070.png

the Yarn task Excute failed , because excessive number of files under the path yarn.nodemanager.local-dirs
causes Inode to run out and calculates task failure

!image-2018-12-25-09-53-15-067.png!

check the NM Logs , found that many localized dirs delete failed because of user not found
in security Systerm.

actually the local dir files's size is 4.4GB, not 240859897B print in the log

!image-2018-12-25-10-03-51-070.png!

The user  not found is because of our userInfo is saved in Ldap DB , when Ldap Service have
problem at some time , then get the user info will fail(not because the user is deleted).When
the Ldap Server recovery at some time , the user info can get .

The problem is even we can get the user info later , the dirs that deleted failed before will
never be deleted later (it is deleted from the tracker list ), this cause the dirs accumulation
. 

I think NM ResourceLocalizationService should  determine whether the file was deleted successfully
by Deletion Service Thread before deleting the directory from tracker list and levelDB,if
deleted failed ,we should add back it to tracker list ,then delete the next dirs till the
local dirs size is  below yarn.nodemanager.localizer.cache.target-size-mb

.

  

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message