flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaowei Jiang <xiaow...@gmail.com>
Subject Efficient Batch Operator in Streaming
Date Thu, 20 Oct 2016 07:50:51 GMT
Very often, it's more efficient to process a batch of records at once
instead of processing them one by one. We can use window to achieve this
functionality. However, window will store all records in states, which can
be costly. It's desirable to have an efficient implementation of batch
operator. The batch operator works per task and behave similarly to aligned
windows. Here is an example of how the interface looks like to a user.

interface BatchFunction {
    // add the record to the buffer
    // returns if the batch is ready to be flushed
    boolean addRecord(T record);

    // process all pending records in the buffer
    void flush(Collector collector) ;
}

DataStream ds = ...
BatchFunction func = ...
ds.batch(func);

The operator calls addRecord for each record. The batch function saves the
record in its own buffer. The addRecord returns if the pending buffer
should be flushed. In that case, the operator invokes flush.

Please share your thoughts. The corresponding JIRA is
https://issues.apache.org/jira/browse/FLINK-4854

Xiaowei

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message