kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sophie Blee-Goldman (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-8627) Investigate batching on state restore
Date Wed, 03 Jul 2019 19:40:00 GMT
Sophie Blee-Goldman created KAFKA-8627:

             Summary: Investigate batching on state restore
                 Key: KAFKA-8627
                 URL: https://issues.apache.org/jira/browse/KAFKA-8627
             Project: Kafka
          Issue Type: Improvement
          Components: streams
            Reporter: Sophie Blee-Goldman

Currently when rebuilding state from scratch, we form batches based on whatever is returned
by poll() and write them to RocksDB. Given the structure of RocksDB, inserting large sorted
batches gives the best performance when writing large amounts of data at once, so we should
investigate the potential restore-time improvement of 

1) Larger batches – either by tuning the restore consumer to return larger amounts of data,
buffering records into larger batches, or both

2) Sorting batches 


These two factors are likely to be coupled, so we should explore the performance gains/hits
by varying both if possible (ie turn sorting on/off with a variety of batch sizes) 

This message was sent by Atlassian JIRA

View raw message