hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-6376) Exceptions caused by synchronous putEntities requests can be swallowed
Date Sat, 21 Oct 2017 06:06:41 GMT

     [ https://issues.apache.org/jira/browse/YARN-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Varun Saxena updated YARN-6376:
    Fix Version/s: 2.9.0

> Exceptions caused by synchronous putEntities requests can be swallowed
> ----------------------------------------------------------------------
>                 Key: YARN-6376
>                 URL: https://issues.apache.org/jira/browse/YARN-6376
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: ATSv2
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>            Priority: Critical
>              Labels: atsv2-hbase, yarn-5355-merge-blocker
>             Fix For: 2.9.0, YARN-5355, YARN-5355-branch-2, 3.0.0-alpha4
>         Attachments: YARN-6376.00.patch
> TimelineCollector.putEntitities() is currently implemented by calling TimelineWriter.write()
followed by TimelineWriter.flush(). Given HBaseTimelineWriter.write() is an asynchronous operation,
it is possible that TimelineClient sends a synchronous putEntities() request for critical
data, but never gets back an exception even though the HBase write request to store the entities
may have failed. 
> This is due to a race condition between the WriterFlushThread in TimelineCollectorManager
and web threads handling synchronous putEntities() requests. Entities are first put into the
buffer by the web thread, it is possible that before the web thread invokes writer.flush(),
WriterFlushThread is fired up to flush the writer. If the entities were not successfully written
to the backend during flush, the WriterFlushThread would just simply log an error, whereas
the web thread would never get an exception out from its writer.flush() invocation. This is
bad because the reason of TimelineClient sending synchronously putEntities() is to retry upon
any exception.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message