spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Structured Streaming with Kafka sources/sinks
Date Tue, 30 Aug 2016 16:12:08 GMT
In case it wasn't obvious from the ticket, I'm happy to work on this,
I just don't want to get in a situation where the work I do conflicts
with or duplicates work that's already being done.

On Tue, Aug 30, 2016 at 11:02 AM, Reynold Xin <rxin@databricks.com> wrote:
> In this case simply not much progress has been made, because people might be
> busy with other stuff.
>
> Ofir it looks like you have spent non-trivial amount of time thinking about
> this topic and have even designed something to work -- can you chime in on
> the JIRA ticket with your thoughts and your prototype? That would be
> tremendously useful to the project.
>
>
>
> On Tue, Aug 30, 2016 at 11:44 PM, Nicholas Chammas
> <nicholas.chammas@gmail.com> wrote:
>>
>> > I personally find it disappointing that a big chuck of Spark's design
>> > and development is happening behind closed curtains.
>>
>> I'm not too familiar with Streaming, but I see design docs and proposals
>> for ML and SQL published here and on JIRA all the time, and they are
>> discussed extensively.
>>
>> For example, here are some ML JIRAs with extensive design discussions:
>> SPARK-6725, SPARK-13944, SPARK-16365
>>
>> Nick
>>
>> On Tue, Aug 30, 2016 at 11:10 AM Cody Koeninger <cody@koeninger.org>
>> wrote:
>>>
>>> Not that I wouldn't rather have more open communication around this
>>> issue...but what are people actually expecting to get out of
>>> structured streaming with regard to Kafka?
>>>
>>> There aren't any realistic pushdown-type optimizations available, and
>>> from what I could tell the last time I looked at structured streaming,
>>> resolving the event time vs processing time issue was still a ways
>>> off.
>>>
>>> On Tue, Aug 30, 2016 at 1:56 AM, Ofir Manor <ofir.manor@equalum.io>
>>> wrote:
>>> > I personally find it disappointing that a big chuck of Spark's design
>>> > and
>>> > development is happening behind closed curtains. It makes it harder
>>> > than
>>> > necessary for me to work with Spark. We had to improvise in the recent
>>> > weeks
>>> > a temporary solution for reading from Kafka (from Structured Streaming)
>>> > to
>>> > unblock our development, and I feed that if the design and development
>>> > of
>>> > that feature was done in the open, it would have saved us a lot of
>>> > hassle
>>> > (and would reduce the refactoring of our code base).
>>> >
>>> > It hard not compare it to other Apache projects - for example, I
>>> > believe
>>> > most of the Apache Kafka full-time contributors work at a single
>>> > company,
>>> > but they manage as a community to have a very transparent design and
>>> > development process, which seems to work great.
>>> >
>>> > Ofir Manor
>>> >
>>> > Co-Founder & CTO | Equalum
>>> >
>>> > Mobile: +972-54-7801286 | Email: ofir.manor@equalum.io
>>> >
>>> >
>>> > On Mon, Aug 29, 2016 at 10:39 PM, Fred Reiss <freiss.oss@gmail.com>
>>> > wrote:
>>> >>
>>> >> I think that the community really needs some feedback on the progress
>>> >> of
>>> >> this very important task. Many existing Spark Streaming applications
>>> >> can't
>>> >> be ported to Structured Streaming without Kafka support.
>>> >>
>>> >> Is there a design document somewhere?  Or can someone from the
>>> >> DataBricks
>>> >> team break down the existing monolithic JIRA issue into smaller steps
>>> >> that
>>> >> reflect the current development plan?
>>> >>
>>> >> Fred
>>> >>
>>> >>
>>> >> On Sat, Aug 27, 2016 at 2:32 PM, Koert Kuipers <koert@tresata.com>
>>> >> wrote:
>>> >>>
>>> >>> thats great
>>> >>>
>>> >>> is this effort happening anywhere that is publicly visible? github?
>>> >>>
>>> >>> On Tue, Aug 16, 2016 at 2:04 AM, Reynold Xin <rxin@databricks.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> We (the team at Databricks) are working on one currently.
>>> >>>>
>>> >>>>
>>> >>>> On Mon, Aug 15, 2016 at 7:26 PM, Cody Koeninger <cody@koeninger.org>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> https://issues.apache.org/jira/browse/SPARK-15406
>>> >>>>>
>>> >>>>> I'm not working on it (yet?), never got an answer to the
question
>>> >>>>> of
>>> >>>>> who was planning to work on it.
>>> >>>>>
>>> >>>>> On Mon, Aug 15, 2016 at 9:12 PM, Guo, Chenzhao
>>> >>>>> <chenzhao.guo@intel.com>
>>> >>>>> wrote:
>>> >>>>> > Hi all,
>>> >>>>> >
>>> >>>>> >
>>> >>>>> >
>>> >>>>> > I’m trying to write Structured Streaming test code
and will deal
>>> >>>>> > with
>>> >>>>> > Kafka
>>> >>>>> > source. Currently Spark 2.0 doesn’t support Kafka
sources/sinks.
>>> >>>>> >
>>> >>>>> >
>>> >>>>> >
>>> >>>>> > I found some Databricks slides saying that Kafka sources/sinks
>>> >>>>> > will
>>> >>>>> > be
>>> >>>>> > implemented in Spark 2.0, so is there anybody working
on this?
>>> >>>>> > And
>>> >>>>> > when will
>>> >>>>> > it be released?
>>> >>>>> >
>>> >>>>> >
>>> >>>>> >
>>> >>>>> > Thanks,
>>> >>>>> >
>>> >>>>> > Chenzhao Guo
>>> >>>>>
>>> >>>>>
>>> >>>>> ---------------------------------------------------------------------
>>> >>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message