kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Roesler (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-8478) Poll for more records before forced processing
Date Tue, 04 Jun 2019 15:35:08 GMT
John Roesler created KAFKA-8478:
-----------------------------------

             Summary: Poll for more records before forced processing
                 Key: KAFKA-8478
                 URL: https://issues.apache.org/jira/browse/KAFKA-8478
             Project: Kafka
          Issue Type: Improvement
          Components: streams
            Reporter: John Roesler


While analyzing the algorithm of Streams's poll/process loop, I noticed the following:
The algorithm of runOnce is:

{code}
loop0:
  long poll for records (100ms)
  loop1:
    loop2: for BATCH_SIZE iterations:
      process one record in each task that has data enqueued
    adjust BATCH_SIZE
    if loop2 processed any records, repeat loop 1
    else, break loop1 and repeat loop0
{code}

There's potentially an unwanted interaction between "keep processing as long as any record
is processed" and forcing processing after `max.task.idle.ms`.

If there are two tasks, A and B, and A runs out of records on one input before B, then B could
keep the processing loop running, and hence prevent A from getting any new records, until max.task.idle.ms
expires, at which point A will force processing on its other input partition. The intent of
idling is to at least give A a chance of getting more records on the empty input, but under
this situation, we'd never even check for more records before forcing processing.

I'm thinking we should only enforce processing if there was a completed poll since we noticed
the task was missing inputs (otherwise, we may as well not bother idling at all).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message