hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4705) ATS 1.5 parse pipeline to consider handling open() events recoverably
Date Mon, 22 Feb 2016 16:52:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157262#comment-15157262
] 

Jason Lowe commented on YARN-4705:
----------------------------------

ATS 1.5 relies on reading files that are actively being written, and there are inherent errors
that can occur during those scenarios.  I believe the error above is aborting the current
attempt to read the summary file, but it should be trying again shortly later.  For active
applications it essentially is polling for more events and will keep trying to read both the
summary file and detailed file(s), seeking past the amount of data successfully read so far.

> ATS 1.5 parse pipeline to consider handling open() events recoverably
> ---------------------------------------------------------------------
>
>                 Key: YARN-4705
>                 URL: https://issues.apache.org/jira/browse/YARN-4705
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Priority: Minor
>
> During one of my own timeline test runs, I've been seeing a stack trace warning that
the CRC check failed in Filesystem.open() file; something the FS was ignoring.
> Even though its swallowed (and probably not the cause of my test failure), looking at
the code in {{LogInfo.parsePath()}} that it considers a failure to open a file as unrecoverable.

> on some filesystems, this may not be the case, i.e. if its open for writing it may not
be available for reading; checksums maybe a similar issue. 
> Perhaps a failure at open() should be viewed as recoverable while the app is still running?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message