hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Kanter (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
Date Wed, 02 Sep 2015 08:43:46 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Kanter updated MAPREDUCE-6415:
-------------------------------------
    Attachment: MAPREDUCE-6415_branch-2.002.patch
                MAPREDUCE-6415.002.patch

Thanks for the review [~jlowe]!

The 002 patch address most of the issues Jason brought up:
- fixes dependencies, though I had to keep some of the ones that maven didn't think it needed
- fixes usage output to use variables for the defaults.  I also changed the units for the
max total logs size to megabytes instead of bytes to be easier to use.
- now SUCCEEDED and FAILED log aggregation statuses are considered.
- improves checkFiles to be more efficient
- if maxEligible is 0, it will now print out a message and exit right away.  I think having
0 be equivalent to all might be confusing?  I'm fine either way; let me know if you think
it's better to treat it as equivalent to a negative value.

I don't think we should add a unique ID to the working directory.  The tool won't work correctly
with simultaneous runs anyway because it doesn't acquire any sort of "lock" that would stop
another instance from trying to process the same application's logs.  As it is now, by using
a non-unique directory, anything left over will get cleaned up when you run the tool again
(presumably, you're running it at some interval).

On that last point, it would be good if we could prevent two instances of the tool from running
at the same time.  I think the best way to do (without using a lock) is for the tool to check
for a RUNNING job named "ArchiveLogs" in the RM, though this won't protect against all situations
and will have a false positive if the user has another job named "ArchiveLogs".

> Create a tool to combine aggregated logs into HAR files
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-6415
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.8.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415.001.patch, MAPREDUCE-6415.002.patch,
MAPREDUCE-6415_branch-2.001.patch, MAPREDUCE-6415_branch-2.002.patch, MAPREDUCE-6415_branch-2_prelim_001.patch,
MAPREDUCE-6415_branch-2_prelim_002.patch, MAPREDUCE-6415_prelim_001.patch, MAPREDUCE-6415_prelim_002.patch
>
>
> While we wait for YARN-2942 to become viable, it would still be great to improve the
aggregated logs problem.  We can write a tool that combines aggregated log files into a single
HAR file per application, which should solve the too many files and too many blocks problems.
 See the design document for details.
> See YARN-2942 for more context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message