flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaowei Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-4854) Efficient Batch Operator in Streaming
Date Wed, 19 Oct 2016 01:32:58 GMT
Xiaowei Jiang created FLINK-4854:
------------------------------------

             Summary: Efficient Batch Operator in Streaming
                 Key: FLINK-4854
                 URL: https://issues.apache.org/jira/browse/FLINK-4854
             Project: Flink
          Issue Type: Improvement
            Reporter: Xiaowei Jiang
            Assignee: MaGuowei


Very often, it's more efficient to process a batch of records at once instead of processing
them one by one. We can use window to achieve this functionality. However, window will store
all records in states, which can be costly. It's desirable to have an efficient implementation
of batch operator. The batch operator works per task and behave similarly to aligned windows.
Here is an example of how the interface looks like to a user.

interface BatchFunction {
    // add the record to the buffer
    // returns if the batch is ready to be flushed
    boolean addRecord(T record);

    // process all pending records in the buffer
    void flush(Collector collector) ;
}

DataStream ds = ...
BatchFunction func = ...
ds.batch(func);

The operator calls addRecord for each record. The batch function saves the record in its own
buffer. The addRecord returns if the pending buffer should be flushed. In that case, the operator
invokes flush.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message