crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-606) Create a KafkaSource
Date Sun, 08 May 2016 20:57:12 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275734#comment-15275734
] 

Micah Whitacre commented on CRUNCH-606:
---------------------------------------

Thanks for the hint.  I can easily simplify to do what you are proposing.  The one bit we
might be missing out on is that Kafka's Serializer/Deserializer takes in a "topic" field and
boolean "isKey" field as well as configuration properties.  By the time it leaves the InputFormat/RecordReader
t has lost that info so we'd lose a little flexibility.  We don't actually us that right now
but it'd be nice to support it.  I'll play around with some of what you proposed and other
options.  I currently have the source implemented aside from this conversion piece.

> Create a KafkaSource
> --------------------
>
>                 Key: CRUNCH-606
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-606
>             Project: Crunch
>          Issue Type: New Feature
>          Components: IO
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-606.patch
>
>
> Pulling data out of Kafka is a common use case and some of the ways to do it Kafka Connect,
Camus, Gobblin do not integrate nicely with existing processing pipelines like Crunch.  With
Kafka 0.9, the consuming API is a lot easier so we should build a Source implementation that
can read from Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message