apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Weise (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (APEXMALHAR-2223) Managed state should parallelize WAL writes
Date Sat, 03 Sep 2016 00:13:20 GMT

     [ https://issues.apache.org/jira/browse/APEXMALHAR-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Thomas Weise updated APEXMALHAR-2223:
    Affects Version/s: 3.4.0

> Managed state should parallelize WAL writes
> -------------------------------------------
>                 Key: APEXMALHAR-2223
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2223
>             Project: Apache Apex Malhar
>          Issue Type: Improvement
>    Affects Versions: 3.4.0
>            Reporter: Thomas Weise
> Currently, data is accumulated in memory and written to the WAL on checkpoint only. This
causes a write spike on checkpoint and does not utilize the HDFS write pipeline. The other
extreme is writing to the WAL as soon as data arrives and then only flush in beforeCheckpoint.
The downside of this is that when the same key is written many times, all duplicates will
be in the WAL. Need to find a balances approach, that the user can potentially fine tune.

This message was sent by Atlassian JIRA

View raw message