hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Lu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2
Date Mon, 05 Oct 2015 21:17:27 GMT

    [ https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944067#comment-14944067

Li Lu commented on YARN-4061:

Thanks for the review [~sjlee0]! 

bq. Since the actual storage writer (HBase) always acts on this queue asynchronously, it seems
that the client cannot have a synchronous write semantics. Is that a correct reading? If so,
how would we implement such a synchronous write?

This is definitely a valid concern. Yes having a pure synchronous semantic with this design
is hard. To support synchronous semantic we generally have two ways:
- We not only need to enforce a flush, but on synchronous calls also need to block until the
the data is actually persisted onto HBase. The advantage of this design is simplicity, but
if the HBase storage is not available we cannot perform any synchronous calls. This makes
the "fault tolerant" feature less appealing. 
- Since we know (and trust) that data on HDFS will be eventually available in HBase, maybe
we can have a FT reader to check HDFS on or before we check the HBase? In this way we can
always select out the most update data, either in HDFS or in HBase. The shortcoming of this
approach is that local file storage will not work here, because those buffered data is not
generally available to other nodes (and I doubt if this strong consistency model is too ambitious
given the amount of data). 

About throughput, I agree we need to be careful here. We may have some traffic with similar
scale and flow as the MapReduce JobHistory server? If this is the case, I think we can definitely
start with some ideas in the JHS? 

> [Fault tolerance] Fault tolerant writer for timeline v2
> -------------------------------------------------------
>                 Key: YARN-4061
>                 URL: https://issues.apache.org/jira/browse/YARN-4061
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>            Assignee: Li Lu
>         Attachments: FaulttolerantwriterforTimelinev2.pdf
> We need to build a timeline writer that can be resistant to backend storage down time
and timeline collector failures. 

This message was sent by Atlassian JIRA

View raw message