kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Roesler <j...@confluent.io>
Subject Re: [DISCUSS] KIP-267: Add Processor Unit Test Support to Kafka Streams Test Utils
Date Fri, 09 Mar 2018 17:54:12 GMT
Hi Guozhang and Bill,

I'll summarize what I'm currently thinking in light of all the discussion:

Mock Processor Context:
===================

Here's how I see the use cases for the two mocks differing:

1. o.a.k.test.MPC: Crafted for testing Streams use cases. Implements
AbstractProcessorContext, actually forward to child processor nodes, allow
restoring a state store. Most importantly, the freedom to do stuff
convenient for our tests without impacting anyone.

2. (test-utils) MPC: Crafted for testing community Processors (and
friends). Very flat and simple implementation (so people can read it in one
sitting); i.e., doesn't drag in other data models like RecordContext. Test
one processor in isolation, so generally don't bother with complex logic
like scheduling punctuators, forwarding results, or restoring state stores.
Most importantly, an API that can be stable.

So, I really am leaning toward keeping both implementations. I like Bill's
suggestion of renaming the unit testing class to
InternalMockProcessorContext, since having classes with the same name in
different packages is confusing. I look forward to the day when Java 9
takes off and we can actually hide internal classes from the public
interface.

test-utils module:
=============

This is actually out of scope for this KIP if we keep both MPC
implementations, but it has been a major feature of this discussion, so we
may as well see it though.

I've waffled a bit on this point, but right now I would propose we
restructure the streams directory thusly:

streams/ (artifact name := "streams", the actual streams code lives here)
- test-utils/ (this is the current test-utils artifact, depends on
"streams")
- tests/ (new module, depends on "streams" and "test-utils", *NO published
artifact*)

This gets us out of the circular dependency without having to engage in any
Gradle shenanigans while preserving "test-utils" as a separate artifact.
This is good because: 1) the test-utils don't need to be in production
code, so it's nice to have a separate artifact, 2) test-utils is already
public in 1.1, and it's a bummer to introduce users' code when we can so
easily avoid it.

Note, though, that if we agree to keep both MPC implementations, then this
really is just important for rewriting our tests to use TopologyTestDriver,
and in fact only the tests that need it should move to "streams/tests/".

What say you?

-John

On Fri, Mar 9, 2018 at 9:01 AM, Guozhang Wang <wangguoz@gmail.com> wrote:

> Hmm.. it seems to be a general issue then, since we were planning to also
> replace the KStreamTestDriver and ProcessorTopologyTestDriver with the new
> TopologyTestDriver soon, so if the argument that testing dependency could
> still cause circular dependencies holds it means we cannot do that as well.
>
> My understanding on gradle dependencies has been that test dependencies are
> not required to compile when compiling the project, but only required when
> testing the project; and the way we script gradle follows the way that for
> any test tasks of the project we require compiling it first so this is
> fine. John / Bill, could you elaborate a bit more on the maintenance
> complexity concerns?
>
>
> Guozhang
>
> On Fri, Mar 9, 2018 at 7:40 AM, Bill Bejeck <bbejeck@gmail.com> wrote:
>
> > John,
> >
> > Sorry for the delayed response.  Thanks for the KIP, I'm +1 on it, and I
> > don't have any further comments on the KIP itself aside from the comments
> > that others have raised.
> >
> > Regarding the existing MockProcessorContext and its removal in favor of
> the
> > one added from this KIP, I'm actually in favor of keeping both.
> >
> > IMHO it's reasonable to have both because the testing requirements are
> > different.  Most users are trying to verify their logic works as expected
> > within a Kafka Streams application and aren't concerned (or shouldn't be
> at
> > least, again IMHO) with testing Kafka Streams itself, that is the
> > responsibility of the Kafka Streams developers and contributors.
> >
> > However, for the developers and contributors of Kafka Streams, the need
> to
> > test the internals of how Streams works is the primary concern and could
> at
> > times require different logic or available methods from a given mock
> > object.
> >
> > I have a couple of thoughts on mitigation of having two
> > MockProcessorContext objects
> >
> >    1. Leave the current MockProcessorContext in the o.a.k.test package
> but
> >    rename it to InternalMockProcessorContext and add some documentation
> as
> > to
> >    why it's there.
> >    2. Create a new package under o.a.k.test, called internals and move
> the
> >    existing MockProcessorContext there, but that would require a change
> to
> > the
> >    visibility of the MockProcessorContext#allStateStores() method to
> > public.
> >
> > Just wanted to throw in my 2 cents.
> >
> > Thanks,
> > Bill
> >
> > On Thu, Mar 8, 2018 at 11:51 PM, John Roesler <john@confluent.io> wrote:
> >
> > > I think what you're suggesting is to:
> > > 1. compile the main streams code, but not the tests
> > > 2. compile test-utils (and compile and run the test-utils tests)
> > > 3. compile and run the streams tests
> > >
> > > This works in theory, since the test-utils depends on the main streams
> > > code, but not the streams tests. and the streams tests depend on
> > test-utils
> > > while the main streams code does not.
> > >
> > > But after poking around a bit and reading up on it, I think this is not
> > > possible, or at least not mainstream.
> > >
> > > The issue is that dependencies are formed between projects, in this
> case
> > > streams and streams:test-utils. The upstream project must be built
> before
> > > the dependant one, regardless of whether the dependency is for
> compiling
> > > the main code or the test code. This means we do have a circular
> > dependency
> > > on our hands if we want the tests in streams to use the test-utils,
> since
> > > they'd both have to be built before the other.
> > >
> > > Gradle seems to be quite scriptable, so there may be some way to
> achieve
> > > this, but increasing the complexity of the build also introduces a
> > project
> > > maintenance concern.
> > >
> > > The MockProcessorContext itself is pretty simple, so I'm tempted to
> argue
> > > that we should just have one for internal unit tests and another for
> > > test-utils, however this situation also afflicts KAFKA-6474
> > > <https://issues.apache.org/jira/browse/KAFKA-6474>, and the
> > > TopologyTestDriver is not so trivial.
> > >
> > > I think the best thing at this point is to go ahead and fold the
> > test-utils
> > > into the streams project. We can put it into a separate "testutils"
> > package
> > > to make it easy to identify which code is for test support and which
> code
> > > is Kafka Streams. The biggest bummer about this suggestion is that it
> we
> > > *just* introduced the test-utils artifact, so folks would to add that
> > > artifact in 1.1 to write their tests and then have to drop it again in
> > 1.2.
> > >
> > > The other major solution is to create a new gradle project for the
> > streams
> > > unit tests, which depends on streams and test-utils and move all the
> > > streams unit tests there. I'm pretty sure we can configure gradle just
> to
> > > include this project for running tests and not actually package any
> > > artifacts. This structure basically expresses your observation that the
> > > test code is essentially a separate module from the main streams code.
> > >
> > > Of course, I'm open to alternatives, especially if someone with more
> > > experience in Gradle is aware of a solution.
> > >
> > > Thanks,
> > > -John
> > >
> > >
> > > On Thu, Mar 8, 2018 at 3:39 PM, Matthias J. Sax <matthias@confluent.io
> >
> > > wrote:
> > >
> > > > Isn't MockProcessorContext in o.a.k.test part of the unit-test
> package
> > > > but not the main package?
> > > >
> > > > This should resolve the dependency issue.
> > > >
> > > > -Matthias
> > > >
> > > > On 3/8/18 3:32 PM, John Roesler wrote:
> > > > > Actually, replacing the MockProcessorContext in o.a.k.test could
> be a
> > > bit
> > > > > tricky, since it would make the "streams" module depend on
> > > > > "streams:test-utils", but "streams:test-utils" already depends on
> > > > "streams".
> > > > >
> > > > > At first glance, it seems like the options are:
> > > > > 1. leave the two separate implementations in place. This shouldn't
> be
> > > > > underestimated, especially since our internal tests may need
> > different
> > > > > things from a mocked P.C. than our API users.
> > > > > 2. move the public testing artifacts into the regular streams
> module
> > > > > 3. move the unit tests for Streams into a third module that depends
> > on
> > > > both
> > > > > streams and test-utils. Yuck!
> > > > >
> > > > > Thanks,
> > > > > -John
> > > > >
> > > > > On Thu, Mar 8, 2018 at 3:16 PM, John Roesler <john@confluent.io>
> > > wrote:
> > > > >
> > > > >> Thanks for the review, Guozhang,
> > > > >>
> > > > >> In response:
> > > > >> 1. I missed that! I'll look into it and update the KIP.
> > > > >>
> > > > >> 2. I was planning to use the real implementation, since folks
> might
> > > > >> register some metrics in the processors and want to verify the
> > values
> > > > that
> > > > >> get recorded. If the concern is about initializing all the stuff
> > > that's
> > > > in
> > > > >> the Metrics object, I can instantiate it lazily or even make
it
> > > > optional by
> > > > >> taking a nullable constructor parameter.
> > > > >>
> > > > >> 3. Agreed. I think that's the real sharp edge here. I actually
> think
> > > it
> > > > >> would be neat to auto-trigger those scheduled punctuators, but
it
> > > seems
> > > > >> like that moves this component out of "mock" territory and into
> > > "driver"
> > > > >> territory. Since we already have the TopologyTestDriver, I'd
> prefer
> > to
> > > > >> focus on keeping the mock lean. I agree it should be in the
> javadoc
> > as
> > > > well
> > > > >> as the web documentation.
> > > > >>
> > > > >> Thanks,
> > > > >> -John
> > > > >>
> > > > >> On Thu, Mar 8, 2018 at 1:46 PM, Guozhang Wang <wangguoz@gmail.com
> >
> > > > wrote:
> > > > >>
> > > > >>> Hello John,
> > > > >>>
> > > > >>> Thanks for the KIP. I made a pass over the wiki page and
here are
> > > some
> > > > >>> comments:
> > > > >>>
> > > > >>> 1. Meta-comment: there is an internal class MockProcessorContext
> > > under
> > > > the
> > > > >>> o.a.k.test package, which should be replaced as part of this
KIP.
> > > > >>>
> > > > >>> 2. In @Override StreamsMetrics metrics(), will you return
a fully
> > > > created
> > > > >>> StreamsMetricsImpl object or are you planning to use the
> > > > >>> MockStreamsMetrics? Note that for the latter case you probably
> need
> > > to
> > > > >>> look
> > > > >>> into https://issues.apache.org/jira/browse/KAFKA-5676 as
well.
> > > > >>>
> > > > >>> 3. Not related to the KIP changes themselves: about
> > > > >>> "context.scheduledPunctuators": we need to well document
that in
> > the
> > > > >>> MockProcessorContext the scheduled punctuator will never
by
> > > > >>> auto-triggered,
> > > > >>> and hence it is only for testing people's code that some
> > punctuators
> > > > are
> > > > >>> indeed registered, and if people want full auto punctuation
> testing
> > > > they
> > > > >>> have to go with TopologyTestDriver.
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> Guozhang
> > > > >>>
> > > > >>>
> > > > >>> On Wed, Mar 7, 2018 at 8:04 PM, John Roesler <john@confluent.io>
> > > > wrote:
> > > > >>>
> > > > >>>> On Wed, Mar 7, 2018 at 8:03 PM, John Roesler <john@confluent.io
> >
> > > > wrote:
> > > > >>>>
> > > > >>>>> Thanks Ted,
> > > > >>>>>
> > > > >>>>> Sure thing; I updated the example code in the KIP
with a little
> > > > >>> snippet.
> > > > >>>>>
> > > > >>>>> -John
> > > > >>>>>
> > > > >>>>> On Wed, Mar 7, 2018 at 7:18 PM, Ted Yu <yuzhihong@gmail.com>
> > > wrote:
> > > > >>>>>
> > > > >>>>>> Looks good.
> > > > >>>>>>
> > > > >>>>>> See if you can add punctuator into the sample
code.
> > > > >>>>>>
> > > > >>>>>> On Wed, Mar 7, 2018 at 7:10 PM, John Roesler
<
> john@confluent.io
> > >
> > > > >>> wrote:
> > > > >>>>>>
> > > > >>>>>>> Dear Kafka community,
> > > > >>>>>>>
> > > > >>>>>>> I am proposing KIP-267 to augment the public
Streams test
> utils
> > > > >>> API.
> > > > >>>>>>> The goal is to simplify testing of Kafka
Streams
> applications.
> > > > >>>>>>>
> > > > >>>>>>> Please find details in the
> > > > >>>>>>> wiki:https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > >>>>>>> 267%3A+Add+Processor+Unit+Test+Support+to+Kafka+Streams+
> > > Test+Utils
> > > > >>>>>>>
> > > > >>>>>>> An initial WIP PR can be found here:https://github.com/
> > > > >>>>>>> apache/kafka/pull/4662
> > > > >>>>>>>
> > > > >>>>>>> I also included the user-list (please hit
"reply-all" to
> > include
> > > > >>> both
> > > > >>>>>>> lists in this KIP discussion).
> > > > >>>>>>>
> > > > >>>>>>> Thanks,
> > > > >>>>>>>
> > > > >>>>>>> -John
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>> -- Guozhang
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > > >
> > > >
> > >
> >
>
>
>
> --
> -- Guozhang
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message