eagle-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "wujinhu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (EAGLE-920) mr failed job trouble shooting
Date Wed, 22 Feb 2017 03:55:44 GMT
wujinhu created EAGLE-920:

             Summary: mr failed job trouble shooting
                 Key: EAGLE-920
                 URL: https://issues.apache.org/jira/browse/EAGLE-920
             Project: Eagle
          Issue Type: Improvement
          Components: App::Job Performance Monitor
    Affects Versions: v0.5.0
            Reporter: wujinhu
            Assignee: wujinhu
             Fix For: v0.5.0

We will follow below steps when we find a failed mr job.
1. get error category distribution of the job via api
query=TaskAttemptErrorCategoryService[@site="sandbox" and @jobId="job_1486726244016_162594"]<@errorCategory>{count}
2. get error category - error message mapping and failed task attempts list
query=JobErrorMappingService[@site="sandbox" and @jobId="job_1486726244016_162594" and @errorCategory="java.lang.RuntimeException"]
3. dive into one task attempt
query=TaskAttemptExecutionService[@site="sandbox" and @taskAttemptId="attempt_1486726244016_162594_m_002451_1"]

This message was sent by Atlassian JIRA

View raw message