hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7070) some of local cache files for yarn can't be deleted
Date Wed, 23 Aug 2017 16:31:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138597#comment-16138597
] 

Jason Lowe commented on YARN-7070:
----------------------------------

bq. At this point, I don't believe this is a YARN bug.

I disagree.  Even if the spark shuffle handler doesn't clean anything up, these files are
underneath the application's appcache area in YARN.  The nodemanager is supposed to clean
this up when the application completes regardless of what the auxiliary services are doing.

>From the log we see it at least tried to do this:
{noformat}
2017-08-22 05:20:01,260 INFO org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor:
Deleting absolute path : /tmp/hadoop-yarn/nm-local-dir/usercache/hdfs/appcache/application_1501810184023_55949
{noformat}

This looks a lot like the scenario that was fixed in YARN-6846, since the container is getting
killed just as the application completes.  Note how close the two deletes are occurring near
each other:
{noformat}
2017-08-22 05:20:01,260 INFO org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor:
Deleting absolute path : /tmp/hadoop-yarn/nm-local-dir/usercache/hdfs/appcache/application_1501810184023_55949/container_e24_1501810184023_55949_01_000079
2017-08-22 05:20:01,260 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices:
Got event APPLICATION_STOP for appId application_1501810184023_55949
2017-08-22 05:20:01,260 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping application
application_1501810184023_55949
2017-08-22 05:20:01,260 INFO org.apache.spark.network.shuffle.ExternalShuffleBlockResolver:
Application application_1501810184023_55949 removed, cleanupLocalDirs = false
2017-08-22 05:20:01,260 INFO org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor:
Deleting absolute path : /tmp/hadoop-yarn/nm-local-dir/usercache/hdfs/appcache/application_1501810184023_55949
{noformat}

These two deletes are probably racing in parallel.  I highly recommend applying the patch
from YARN-6846 and see if things improve.


> some of local cache files for yarn can't be deleted
> ---------------------------------------------------
>
>                 Key: YARN-7070
>                 URL: https://issues.apache.org/jira/browse/YARN-7070
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.8.1
>         Environment: Hadoop 2.8.1
>            Reporter: Changyao Ye
>         Attachments: application_1501810184023_55949.log
>
>
> We have found some of cache files(in /tmp/hadoop-yarn/nm-local-dir/usercache/hdfs/appcache)
for yarn on nodemanager cannot be deleted properly. The directories are like following(blockmgr***)
> =================
> # ls -ltr application_1501810184023_55949
> total 120
> drwx--x---  2 hdfs yarn 4096 Aug 22 04:29 filecache
> drwxr-s---  2 hdfs yarn 4096 Aug 22 04:56 blockmgr-881fab2c-fba4-4bb1-8dd9-5ab35a512df7
> drwxr-s--- 10 hdfs yarn 4096 Aug 22 04:56 blockmgr-bf8a19f5-e9ae-4269-a0ef-b27d0f9c17e7
> drwxr-s--- 11 hdfs yarn 4096 Aug 22 04:58 blockmgr-f3437e8d-9595-4898-8bda-92ebff3ada1d
> drwxr-s--- 18 hdfs yarn 4096 Aug 22 05:01 blockmgr-930c0cd8-1d31-4cdb-a244-f6ad4bf74bff
> drwxr-s--- 12 hdfs yarn 4096 Aug 22 05:13 blockmgr-83fc0702-ac40-4743-812a-7d488e92004e
> drwxr-s---  9 hdfs yarn 4096 Aug 22 05:13 blockmgr-f6cfe045-12c3-41d6-b77e-aa5200daeb6a
> drwxr-s--- 12 hdfs yarn 4096 Aug 22 05:13 blockmgr-53dcb4ea-ba5d-4b8b-859b-805b9303a149
> drwxr-s--- 10 hdfs yarn 4096 Aug 22 05:13 blockmgr-0c0c4bb9-ef5e-4ca1-8d23-ce5cd58d0a75
> drwxr-s---  9 hdfs yarn 4096 Aug 22 05:13 blockmgr-557d0f39-67d2-491a-9307-12fc1d724380
> drwxr-s--- 10 hdfs yarn 4096 Aug 22 05:13 blockmgr-fbc87680-4df7-498e-bf6d-456a5aea4fc9
> drwxr-s--- 10 hdfs yarn 4096 Aug 22 05:13 blockmgr-53ee8251-fac1-4f62-82c2-5e970f0d86ec
> drwxr-s---  9 hdfs yarn 4096 Aug 22 05:14 blockmgr-5a8bc187-abcf-482d-9da5-e8c4647d4731
> drwxr-s--- 10 hdfs yarn 4096 Aug 22 05:14 blockmgr-251c3a99-cd85-442a-8945-52c344c0d861
> drwxr-s--- 13 hdfs yarn 4096 Aug 22 05:14 blockmgr-c352c1ad-15dc-456b-8b62-5b83b9950494
> drwxr-s--- 12 hdfs yarn 4096 Aug 22 05:15 blockmgr-b4f01347-4b51-4b35-8146-2aa840084c2b
> drwxr-s--- 14 hdfs yarn 4096 Aug 22 05:15 blockmgr-0095d26c-c134-48b4-82a6-e8ae02f0189c
> drwxr-s--- 13 hdfs yarn 4096 Aug 22 05:15 blockmgr-28a31574-61ae-459f-be3a-8608892246d7
> drwxr-s--- 16 hdfs yarn 4096 Aug 22 05:15 blockmgr-c0cd0df9-b355-4209-b6aa-b549a1fa36eb
> drwxr-s--- 11 hdfs yarn 4096 Aug 22 05:15 blockmgr-a2730abb-9517-461e-bedf-d9a2dcef373f
> drwxr-s--- 14 hdfs yarn 4096 Aug 22 05:15 blockmgr-91dd2e1a-6bc2-4429-8b71-2f4240987159
> drwxr-s--- 12 hdfs yarn 4096 Aug 22 05:15 blockmgr-f4e3a586-8817-45ea-a197-9fdbb3d91946
> drwxr-s--- 15 hdfs yarn 4096 Aug 22 05:15 blockmgr-ba2c605e-89d8-4f7c-b42c-6ed4ba6bf4ea
> drwxr-s--- 16 hdfs yarn 4096 Aug 22 05:15 blockmgr-2ae72383-5f72-4002-84a7-e6335b8c2b6c
> drwxr-s--- 13 hdfs yarn 4096 Aug 22 05:15 blockmgr-6c5e260f-d3c7-4af6-91c1-168c73343f2d
> drwxr-s--- 16 hdfs yarn 4096 Aug 22 05:15 blockmgr-2e9923b1-281c-4a9d-8069-6c5430bd5fc3
> drwxr-s--- 18 hdfs yarn 4096 Aug 22 05:15 blockmgr-cc3f1406-d8a2-4bf5-a276-8f7aed75c513
> drwxr-s--- 11 hdfs yarn 4096 Aug 22 05:15 blockmgr-975bcce0-84b2-4590-880b-bf182d76e319
> drwxr-s--- 11 hdfs yarn 4096 Aug 22 05:15 blockmgr-ce82cb63-5998-4227-b85e-77f1c633db43
> drwxr-s--- 11 hdfs yarn 4096 Aug 22 05:15 blockmgr-592af4aa-3c89-4081-8746-29b99f2220b1
> =================
> We also applied patches YARN-4594, YARN-4731, but nothing changed.
> YARN-4594 https://issues.apache.org/jira/browse/YARN-4594
> YARN-4731 https://issues.apache.org/jira/browse/YARN-4731
> Any advice will be greatly appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message