hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM
Date Mon, 22 Feb 2016 17:56:18 GMT

     [ https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Steve Loughran updated YARN-4696:
    Attachment: YARN-4696-006.patch

Patch 006; ongoing (and currently unsuccessful) attempt to use file:// as a destination for
timeline entities

* some better logging of read problems to differentiate empty file from missing file.
* add cleanup of TimelineDataManager in try-with-resources
* explictly thrown an FNFE if the active dir isn't found (Rather than a generic IOE)
* the constant {{FileSystemTimelineWriter.TIMELINE_SERVICE_ENTITYFILE_FS_SUPPORT_APPEND}}
is public, so that you can turn off append support. I know we want a proper API here (HADOOP-9565),
but it's not done yet: a flag is all you have. Making the constant public will make it easier
to track down use in future.
* includes YARN-4716; flush() interface. This propagates all the way down to the FS API (good),
but as file:// is a CRC filesystem, flush/hflush doesn't actually work (it buffers until a
CRC-block of data is ready). And there's no way to turn off that feature via a config option.

What I'm seeing then is that when an app completes its changes are picked up fine. But incomplete
apps aren't, instead the scanner is seeing an 0-byte file and skipping it. Which isn't that
useful at all. 

I suspect the issue here is hdfs vs file filesystem behaviours, something I could fix by moving
to miniHFDS. My fear here is that people may want to use file:// or similar FS in production,
and what we have today doesn't work.

> EntityGroupFSTimelineStore to work in the absence of an RM
> ----------------------------------------------------------
>                 Key: YARN-4696
>                 URL: https://issues.apache.org/jira/browse/YARN-4696
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: YARN-4696-001.patch, YARN-4696-002.patch, YARN-4696-003.patch, YARN-4696-005.patch,
> {{EntityGroupFSTimelineStore}} now depends on an RM being up and running; the configuration
pointing to it. This is a new change, and impacts testing where you have historically been
able to test without an RM running.
> The sole purpose of the probe is to automatically determine if an app is running; it
falls back to "unknown" if not. If the RM connection was optional, the "unknown" codepath
could be called directly, relying on age of file as a metric of completion
> Options
> # add a flag to disable RM connect
> # skip automatically if RM not defined/set to
> # disable retries on yarn client IPC; if it fails, tag app as unknown.

This message was sent by Atlassian JIRA

View raw message