spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tdas <>
Subject [GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...
Date Thu, 23 Jun 2016 17:38:17 GMT
Github user tdas commented on the issue:
    1. I didnt quite get it when you meant "But your description of what the code is currently
    is not accurate, and your recommendation does not meet the use cases." I just collapsed
the three cases into two - when the user has NO PREFERENCES (the system SHOULD figure out
how to schedule partitions on the same executors consistently), and SOME PREFERENCES (because
of co-located brokers, or skew, or whatever). Why doesnt this recommendation meet the criteria?
    2. I agree with the argument that there are whole lot of stuff you cannot do without exposing
a () => Consumer function. Buts thats where the question of API stability comes in. At
this late stage of 2.0 release, I would much rather provide simpler API for simpler usecases
than we know will not break, rather than an API that supports everything is more prone to
breaking if Kafka breaks API. We can always start simple and then add more advanced interfaces
in the future.
    3. Wrapping things up with extra Spark classes and interfaces is a cost we have to pay
in order to prevent API breaking in the future. It is an investment we are undertaking in
every part of Spark - SparkSession (using a builder pattern, instead of exposing constructor),
SQL Data sources (never expose any 3rd party classes), etc. Its hard-learnt lesson. 

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message