hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-512) Log aggregation root directory check is more expensive than it needs to be
Date Wed, 29 May 2013 23:04:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669856#comment-13669856
] 

Vinod Kumar Vavilapalli commented on YARN-512:
----------------------------------------------

If at the time of creating an application's dir, the permission are wrong, then the creation
will fail anyways, right?

bq. If filesystem caching is enabled, which is by default, the verifyAndCreate uses a cached
FileSystem instance, no new connections are being created.
Except idle connections are closed every 60sec by default.

bq. By doing at AM start, things recover automatically without NMs restarts in case of transient
issues with the log dir. Also, having a file-exists and a get-permissions check per app to
be more resilient does not seem heavy. Still, we could reduce this to a single get-permissions
call, if the dir does not exists we'll get a FNE.
What sort of transient failures?

I'm looking at this from the point of scalability issues we've seen on large clusters. NMs
DOS'ing NameNode during log-aggregation. A couple of fixes went in, for e.g., to delay aggregation
to app-finish. But opening this connection for every app is one thing we should completely
avoid if possible. I'll file a ticket, we can continue discussion there.


                
> Log aggregation root directory check is more expensive than it needs to be
> --------------------------------------------------------------------------
>
>                 Key: YARN-512
>                 URL: https://issues.apache.org/jira/browse/YARN-512
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.0.5-beta
>            Reporter: Jason Lowe
>            Assignee: Maysam Yabandeh
>            Priority: Minor
>             Fix For: 2.0.5-beta
>
>         Attachments: YARN-512.patch
>
>
> The log aggregation root directory check first does an {{exists}} call followed by a
{{getFileStatus}} call.  That effectively stats the file twice.  It should just use {{getFileStatus}}
and catch {{FileNotFoundException}} to handle the non-existent case.
> In addition we may consider caching the presence of the directory rather than checking
it each time a node aggregates logs for an application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message