apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chandni Singh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-2223) Managed state should parallelize WAL writes
Date Fri, 16 Sep 2016 16:59:20 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496811#comment-15496811

Chandni Singh commented on APEXMALHAR-2223:

The reason to delay to endWindow is that with every event will have to compare the bucket
size with the threshold. This comparison, though not very expensive, can be done at intervals
because it will take some time to reach the threshold. The interval of an application window
seemed fair to me. 

When buffer of the outputStream is full, data is automatically flushed. Explicitly calling
flush forces any buffered data so for us this forced flush can be invoked in beforeCheckpoint().
We do not need to call it every endWindow. This component relies on checkpoint state, that
is, the wal is truncated to the offset saved in the state after failures. 

> Managed state should parallelize WAL writes
> -------------------------------------------
>                 Key: APEXMALHAR-2223
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2223
>             Project: Apache Apex Malhar
>          Issue Type: Improvement
>    Affects Versions: 3.4.0
>            Reporter: Thomas Weise
>            Assignee: Chandni Singh
> Currently, data is accumulated in memory and written to the WAL on checkpoint only. This
causes a write spike on checkpoint and does not utilize the HDFS write pipeline. The other
extreme is writing to the WAL as soon as data arrives and then only flush in beforeCheckpoint.
The downside of this is that when the same key is written many times, all duplicates will
be in the WAL. Need to find a balances approach, that the user can potentially fine tune.

This message was sent by Atlassian JIRA

View raw message