hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3143) Complete aggregation of user-logs spit out by containers onto DFS
Date Wed, 05 Oct 2011 13:07:34 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120921#comment-13120921

Vinod Kumar Vavilapalli commented on MAPREDUCE-3143:

Just to summarize the design of the current system already implemented(but disabled) in YARN
NodeManagers and the gaps we need to fill in.

 - NM uploads the logs of all the containers of an App into a single file on HDFS named by
node-id in a per-app directory. So for an app, there are a maximum of N log files, N being
the number of nodes in the system.
 - NM starts streaming a container's logs to the file once a container finishes.
 - On app-finish, flushes all containers' logs and closes the per-app, per-node file.
 - Removes the local container-logs on app-finish and once the aggregated file is closed.
 - The log format is a T-File. Keys are container-ids. Values are a list of compound text
of file-type(syslog/stdout/stderr) and the actual container log-file contents.
 - TODO: As of today, NM silently ignores any failures during the log upload. It can increment
a counter for this failures or maintain a list per app of the containers for which it failed
to upload the log.

Coverage of logs: In most cases, we don't need to upload the logs of all the containers.
 - Options include
    -- only AM logs will be uploaded onto the HDFS for any app.
    -- only AM logs + only failed containers' logs
    -- AM logs + failed containers' logs + x% of successful containers
    -- All logs
 - The above retention policy is already implemented by LogAggregationService, but this needs
to be user-configurable: TODO.

Web Serving
 - NM serves the log files of a container till the App finishes. NM doesn't have any indices,
all it does is it prints the logs treating them as files, one after another, possibly with
headers for each log-type.
 - TODO: After the upload finishes, NM will point the user to a configured log-server location.
This is mostly the same as JobHistory server.
 - TODO: For MapReduce: After the App finishes, when users visit their job-history, there
will be servlets which parse the aggregated file and present per container.

command line user-interface
 - A dumper already included for the clients.
   -- Command line is like so, for all container-logs of a single app
      ./yarn/bin/yarn logs -applicationId application_1304487270789_0001
   -- Command line is like so, for a single container-logs
      ./yarn/bin/yarn logs -applicationId application_1304487270789_0001 -containerId container_1304487270789_0001_000002
 - TODO: We need mapreduce specific comand line that takes in a TaskAttemptID and returns

Life on HDFS
 - The log file per-app per-node goes into a system log-dir and is written with user's credentials.
 - The log-dir is per-user and has quotas specified by admins. The quota are, for now, same
for all users, a reasonable value.
 - Tooling like dfs -cat or dfs -text for letting the users print out their logs depending
on the log format above.
 - Admins can have scripts to garbage-collect/HAR logs in the user-dir that have aged beyond
a certain time period (e.g. 15days)
 - TODO: What is the behaviour when user-quotas are hit. Fail aggregation and skip container-logs?
How does the user come to know?
> Complete aggregation of user-logs spit out by containers onto DFS
> -----------------------------------------------------------------
>                 Key: MAPREDUCE-3143
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3143
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>             Fix For: 0.23.0
> Already implemented the feature for handling user-logs spit out by containers in NodeManager.
But the feature is currently disabled due to user-interface issues.
> This is the umbrella ticket for tracking the pending bugs w.r.t putting container-logs
on DFS.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message