hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4705) ATS 1.5 parse pipeline to consider handling open() events recoverably
Date Mon, 22 Feb 2016 17:01:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157273#comment-15157273
] 

Jason Lowe commented on YARN-4705:
----------------------------------

bq. adding a check to skip 0-byte files stops the stack I've been seeing from coming back...

The issue with checking the file size first is that the ATS could refuse to open a file that
has data in it, adding extra delay between the application writing the data and the data appearing
in the ATS.  Since we're trying to read a file that is still being written the file sizes
may not reflect reality, especially in file systems like HDFS where the file size is only
updated when new blocks are allocated.

> ATS 1.5 parse pipeline to consider handling open() events recoverably
> ---------------------------------------------------------------------
>
>                 Key: YARN-4705
>                 URL: https://issues.apache.org/jira/browse/YARN-4705
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Priority: Minor
>
> During one of my own timeline test runs, I've been seeing a stack trace warning that
the CRC check failed in Filesystem.open() file; something the FS was ignoring.
> Even though its swallowed (and probably not the cause of my test failure), looking at
the code in {{LogInfo.parsePath()}} that it considers a failure to open a file as unrecoverable.

> on some filesystems, this may not be the case, i.e. if its open for writing it may not
be available for reading; checksums maybe a similar issue. 
> Perhaps a failure at open() should be viewed as recoverable while the app is still running?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message