apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-2063) Integrate WAL to FS WindowDataManager
Date Thu, 04 Aug 2016 06:16:20 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407256#comment-15407256
] 

ASF GitHub Bot commented on APEXMALHAR-2063:
--------------------------------------------

GitHub user chandnisingh reopened a pull request:

    https://github.com/apache/apex-malhar/pull/322

    APEXMALHAR-2063 Made window data manager use file system wal

    @PramodSSImmaneni @tweise  @ilooner 
    Please review

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chandnisingh/incubator-apex-malhar APEXMALHAR-2063

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/apex-malhar/pull/322.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #322
    
----
commit f3ce1a766d95e5a194d1594c763c112d6b2099af
Author: Chandni Singh <csingh@apache.org>
Date:   2016-08-01T07:37:07Z

    APEXMALHAR-2063 Made window data manager use file system wal

----


> Integrate WAL to FS WindowDataManager
> -------------------------------------
>
>                 Key: APEXMALHAR-2063
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2063
>             Project: Apache Apex Malhar
>          Issue Type: Improvement
>            Reporter: Chandni Singh
>            Assignee: Chandni Singh
>
> FS Window Data Manager is used to save meta-data that helps in replaying tuples every
completed application window after failure. For this it saves meta-data in a file per window.
Having multiple small size files on hdfs cause issues as highlighted here:
> http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> Instead FS Window Data Manager can utilize the WAL to write data and maintain a mapping
of how much data was flushed to WAL each window. 
> In order to use FileSystemWAL for replaying data of a finished window, there are few
changes made to FileSystemWAL this is because of following:
> 1. WindowDataManager needs to reply data of every finished window. This window may not
be checkpointed. 
> FileSystemWAL truncates the WAL file to the checkpointed point after recovery so this
poses a problem. 
> WindowDataManager should be able to control recovery of FileSystemWAL.
> 2.  FileSystemWAL writes to temporary files. The mapping of temp files to actual file
is part of its state which is checkpointed. Since WindowDataManager replays data of a window
not yet checkpointed, it needs to know the actual temporary file the data is being persisted
to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message