hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Kanter (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-6480) archive-logs tool may miss applications
Date Fri, 25 Sep 2015 18:37:04 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Robert Kanter updated MAPREDUCE-6480:
    Attachment: MAPREDUCE-6480.003.patch

003 patch addresses Anubhav's comments:

1. Ya.  For FAILED, we don't know why, so I figured it was safest to just not do anything
to the logs.  For TIMED_OUT, it seemed like the idea is that nothing bad happened, but we're
not expecting any more logs.

2. Done

3. This was generated by IntelliJ.  I'd rather leave it in as-is to be safe.

4. I've added a {{-verbose}} option, which prints out a lot more details about what's happening.

5. Done

6. Done

7. Done

8. I think we should do both.  You're right that we could break out if there's an error with
just one app.  At the same time, if the user running the tool doesn't have permission to access
the logs for a specific user, we don't want to keep trying over and over again.

8 (#2). Done

9. Done

> archive-logs tool may miss applications
> ---------------------------------------
>                 Key: MAPREDUCE-6480
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6480
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: MAPREDUCE-6480.001.patch, MAPREDUCE-6480.002.patch, MAPREDUCE-6480.003.patch
> MAPREDUCE-6415 added a tool to archive aggregated logs into HAR files.  It seeds the
initial list of applications to process based on apps which have finished aggregated, according
to the RM.  However, the RM doesn't remember completed applications forever (e.g. failover),
so it's possible for the tool to miss applications if they're no longer in the RM.  
> Instead, we should do the following:
> # Seed the initial list of apps based on the aggregated log directories
> # Make the RM not consider applications "complete" until their log aggregation has reached
a terminal state (i.e. DISABLED, SUCCEEDED, FAILED, TIME_OUT).  
> #2 will allow #1 to assume that any apps not found in the RM are done aggregating.  #1
on it's own should cover most cases though

This message was sent by Atlassian JIRA

View raw message