hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
Date Thu, 03 Aug 2017 20:24:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113417#comment-16113417
] 

Jason Lowe commented on YARN-6929:
----------------------------------

bq.  I think Max Bucket Size can be derived from yarn.log-aggregation.retain-seconds (in days)

The whole point of bucketing here is to avoid a maximum limit per directory.  Trying to derive
a bucket size from a time value like retention seconds means we need to know the apps-per-second
rate of the cluster which we do not know.  In addition it risks blowing the directory limit
if that rate ends up higher than what was calculated for a sustained period.  It makes more
sense to derive the bucket size from the directory limit since that's what's driving this
change.

bq. And why we need two sub directories (app_id/ bucket_size) and (app_id%bucket_size).

Sorry, we just need it to be (app_id / bucket_size), we don't need the modulo.  So the bucket
path for an app would be:
  aggregation_log_root / user / cluster_timestamp / (app_id/ bucket_size)
with at most bucket_size number of app directories in each.

The log deletion service can clean up empty bucket directories when it removes the last app
from a directory.  It's a little tricky for the bucket delete case since we need to consider
the scenario where we want to delete just as a long-running app tries to aggregate to the
same bucket, but as long as the aggregation process creates the bucket if necessary and the
deletion service takes care to only delete empty bucket directories (i.e.: no recursive delete)
we should be fine.

This should handle any app submission rate up to a point where we are aggregating the square
of bucket_size apps in the retention period.  Beyond that point we'd have more than bucket_size
bucket directories.

> yarn.nodemanager.remote-app-log-dir structure is not scalable
> -------------------------------------------------------------
>
>                 Key: YARN-6929
>                 URL: https://issues.apache.org/jira/browse/YARN-6929
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: log-aggregation
>    Affects Versions: 2.7.3
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>
> The current directory structure for yarn.nodemanager.remote-app-log-dir is not scalable.
Maximum Subdirectory limit by default is 1048576 (HDFS-6102). With retention yarn.log-aggregation.retain-seconds
of 7days, there are more chances LogAggregationService fails to create a new directory with
FSLimitException$MaxDirectoryItemsExceededException.
> The current structure is <yarn.nodemanager.remote-app-log-dir>/<user>/logs/<job_name>.
This can be improved with adding date as a subdirectory like 
> <yarn.nodemanager.remote-app-log-dir>/<user>/logs/<date>/<job_name>

> {code}
> WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
Application failed to init aggregation 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 items=1048576 
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)

> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)

> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841)

> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351)

> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262)

> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)

> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)

> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)

> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)

> at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)

> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)

> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:415) 
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)

> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308)

> at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366)

> at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)

> at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)

> at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)

> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) 
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 items=1048576 
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)

> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)

> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841)

> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351)

> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262)

> {code}
> Thanks to Robert Mancuso for finding this issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message