kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Hauch <rha...@gmail.com>
Subject Re: [DISCUSS] KIP-449: Add connector contexts to Connect worker logs
Date Fri, 12 Apr 2019 22:24:20 GMT
Thanks for the review and feedback, Konstantine.

1. Great suggestion. I've updated the KIP to hopefully make it more clear
that the uncommented line is unchanged from the existing Log4J
configuration file.

2. Regarding including a `-` before the task number is acceptable if it
makes it easier to, read and filter. I've updated the KIP and PR to
incorporate your suggestion.

3. Task numbers do start at 0, as seen by the DistributedHerder code that
creates the tasks (
https://github.com/apache/kafka/blob/02221bd907a23041c95ce6446986bff631652b3a/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L608-L611).
I've updated the KIP to highlight that task numbers are 0-based. As you can
see in the sample log included in the KIP, "task-0" corresponds to the
first task, not to the connector. In fact, the following are examples of
the log messages with the "worker" scope where the Connector implementation
is called:

[2019-04-02 17:01:38,315] INFO [local-file-source|worker] Creating
connector local-file-source of type FileStreamSource
(org.apache.kafka.connect.runtime.Worker:227)
[2019-04-02 17:01:38,317] INFO [local-file-source|worker] Instantiated
connector local-file-source with version 2.3.0-SNAPSHOT of type class
org.apache.kafka.connect.file.FileStreamSourceConnector
(org.apache.kafka.connect.runtime.Worker:230)
...

[2019-04-02 17:11:46,408] INFO [local-file-sink|worker] Stopping connector
local-file-sink (org.apache.kafka.connect.runtime.Worker:334)
[2019-04-02 17:11:46,408] INFO [local-file-sink|worker] Stopped connector
local-file-sink (org.apache.kafka.connect.runtime.Worker:350)


The FileStreamSourceConnector class does not actually log any messages
itself, but if it did  those would be with the "[local-file-source|worker]"
context and would include the name of the class and line number in
parentheses (instead of e.g.,
"org.apache.kafka.connect.runtime.Worker:230").

4. I thought about doing a more complex example, but IMO the extra log
lines made the *sample* in the KIP quite a bit longer harder to understand.
I thought it important to keep the KIP a bit more readable while showing
which scope appear for the different log messages. A second task would
essentially have the nearly the same log messages, just with a task number
in the scope.

5. The PR already changes the `runtime/src/test/resources/log4j.properties`
to include the connector context MDC parameter. Because that's not a public
API, I've not mentioned it in the KIP.

6. I guess I understand your goal - it's not always clear which worker is
being referenced. However, I'm not sure whether it's that valuable to
include the "client.id" (if set) just in the "worker" scope. That means
that it's maybe more useful to introduce a second MDC parameter (e.g., "%{
worker.id}") and optionally include that on all log messages. We'd have to
set the MDC context in the code in each thread, which isn't too much
effort. The other advantage of this approach is that it doesn't have to be
configurable: you can control it via your own logging configuration file
(rather than optionally including it in the "worker" scope on some of the
log messages). Thoughts? What would the "%{worker.id}" MDC value be set to
if "client.id" is not set?

Final notes: Removed the placeholders, and corrected the typos and grammar.

Thanks again for the detailed review!

On Fri, Apr 12, 2019 at 2:05 PM Konstantine Karantasis <
konstantine@confluent.io> wrote:

> Thanks for the KIP Randall.
>
> It might not be obvious right away, but this is a great improvement when
> running Connect with multiple connectors or when debugging Connect, for
> instance in integration tests or system tests. KIP looks good to me
> overall, I have just a few comments below:
>
> 1. In the snippet of the config, by including an uncommented line, maybe
> it's not immediately clear that this line is an existing line in
> connect-log4j.properties and not an addition. Should this be mentioned in a
> separate code block or in a different way?
>
> 2. Currently when adding the taskId to a connector or connector task, we
> precede it with a dash (-). I feel this addition would make it easier to
> parse the taskId visually as well as with parsing tools (for example I can
> use `-` as a delimiter easier than 'k' or even a multi-character delimiter
> such as 'task'). Do you think it's an overhead?
>
> 3. I think a specific mention to the convention used on the base-index for
> taskIds would be useful. For example, taskId equal to zero in most cases
> represents the (source/sink) connector, while taskId > 0 corresponds to
> (source/sink) tasks. By reading the KIP, I assume that task0 (or task-0 if
> you agree with my previous comment) will be the connector. Is this the
> case?
>
> 4. Should the example include a snippet for taskId > 0?
>
> 5. Do you intend to enable connector MDC in tests? In this case how would
> you distinguish between multiple workers in integrations tests? Do you
> intend to address this here, in the PR corresponding to this KIP or follow
> up later?
>
> 6. Related to my previous comment (5). The property 'client.id' is already
> used to identify a Connect worker. Have you thought of offering the ability
> to identify the worker MDC if this property is set, keeping at the same
> time with MDC the ability to not include it as identifier?
>
> Finally, I'd suggest removing text placeholders. I think it will be easier
> to follow the KIP. Currently I see placeholder text below Contents and in
> Rejected alternatives.
>
> And my least favorite section, but I can't help it :)
> Typos/grammar:
> to understand what (is) happening within the worker,
> Kafka Connect use(s) ...
> quotes maybe not needed, text already verbatim in
> `config/connect-log4j.properties`
>
> Best,
> Konstantine
>
> On Tue, Apr 2, 2019 at 4:20 PM Randall Hauch <rhauch@gmail.com> wrote:
>
> > I've been working on https://github.com/apache/kafka/pull/5743 for a
> > while,
> > but there were a number of comment, suggestions, and mild concerns on the
> > PR. One of those comments was that maybe changing the Connect log content
> > in this way probably warrants a KIP. So here it is:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-449%3A+Add+connector+contexts+to+Connect+worker+logs
> >
> > I've also updated my PR reflect the KIP. Please reply with comments
> and/or
> > feedback.
> >
> > Best regards,
> >
> > Randall
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message