apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "bright chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-2130) implement scalable windowed storage
Date Wed, 13 Jul 2016 00:58:20 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374095#comment-15374095

bright chen commented on APEXMALHAR-2130:

Here are some thoughts to implement it based on state manager
- Use one bucket only, use the Event Window and Key to generate key for the bucket, and use
Stream Window as timeBucket. The problem of this approach is get the whole map of one Event
window as it lack of key. Is possible the client code get value by Event Window and key instead
of get whole map?
- One Event window map to one bucket, and Stream window id as timeBucket. But the problem
is the size of Event Window are variable and could be huge. and also the Event window size
are vary. So this approach probably not doable.
- One Streaming window map to a bucket. It will have problem to map Event window. And also
have problem get whole map by Event Window, as we don't know which bucket to read the data.

> implement scalable windowed storage
> -----------------------------------
>                 Key: APEXMALHAR-2130
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2130
>             Project: Apache Apex Malhar
>          Issue Type: Task
>            Reporter: bright chen
>            Assignee: bright chen
> This feature is used for supporting windowing.
> The storage needs to have the following features:
> 1. Spillable key value storage (integrate with APEXMALHAR-2026)
> 2. Upon checkpoint, it saves a snapshot for the entire data set with the checkpointing
window id.  This should be done incrementally (ManagedState) to avoid wasting space with unchanged
> 3. When recovering, it takes the recovery window id and restores to that snapshot
> 4. When a window is committed, all windows with a lower ID should be purged from the
> 5. It should implement the WindowedStorage and WindowedKeyedStorage interfaces, and because
of 2 and 3, we may want to add methods to the WindowedStorage interface so that the implementation
of WindowedOperator can notify the storage of checkpointing, recovering and committing of
a window.

This message was sent by Atlassian JIRA

View raw message