kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guozhang Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-6106) Postpone normal processing of tasks within a thread until restoration of all tasks have completed
Date Sun, 05 Nov 2017 21:41:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239733#comment-16239733
] 

Guozhang Wang commented on KAFKA-6106:
--------------------------------------

Today IQ will not be allowed if the stream thread is not {{State.RUNNING}} and the stream
thread can only turn to {{RUNNING}} if all its tasks are in RUNNING states. So technically
we can fix it two ways:

1) the proposal mentioned in this JIRA.
2) change the StateStoreProvider class, to make stores be queryable on the task level than
on the thread level. So that some tasks' stores can be queryable even though other stores
are not.

I'm inclined to delay going on the second option for now (in other words, add this config
to let users choose), because of the following:

a) most streaming applications that leverage on IQ (think: analytics, monitoring) can only
function when all such state stores are queryable. Making only part of these stores to be
queryable while others are still bootstrapping would not help these applications.
b) a known issue today for IQ is that states from different tasks are not from the same snapshots
so that if they are logically correlated IQ has "phantom reads" scenarios; we have thought
about how to remedy such issues and one of them is to leverage on exactly once semantics.
In that case, moving forward some tasks while still restoring other tasks will simply make
the tasks to be run at further different times.

If in the future we do observe that this is a common request where users do not care about
partial reads or inconsistent reads across states, but are really keen to some of the states
to be queryable ASAP (btw on top of my head I feel even in this case the right way to solve
it would be task assignment improvements?), then we can consider the second option.

> Postpone normal processing of tasks within a thread until restoration of all tasks have
completed
> -------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6106
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6106
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions: 0.11.0.1, 1.0.0
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>
> Let's say a stream thread hosts multiple tasks, A and B. At the very beginning when A
and B are assigned to the thread, the thread state is {{TASKS_ASSIGNED}}, and the thread start
restoring these two tasks during this state using the restore consumer while using normal
consumer for heartbeating.
> If task A's restoration has completed earlier than task B, then the thread will start
processing A immediately even when it is still in the {{TASKS_ASSIGNED}} phase. But processing
task A will slow down restoration of task B since it is single-thread. So the thread's transition
to {{RUNNING}} when all of its assigned tasks have completed restoring and now can be processed
will be delayed.
> Note that the streams instance's state will only transit to {{RUNNING}} when all of its
threads have transit to {{RUNNING}}, so the instance's transition will also be delayed by
this scenario.
> We'd better to not start processing ready tasks immediately, but instead focus on restoration
during the {{TASKS_ASSIGNED}} state to shorten the overall time of the instance's state transition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message