crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CRUNCH-606) Create a KafkaSource
Date Mon, 09 May 2016 21:22:13 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Micah Whitacre updated CRUNCH-606:
----------------------------------
    Attachment: CRUNCH-606-byteswritable.diff

Ok went with the simplest version I could get working by the end of the day.  The KafkaSource
always produces PTableType<BytesWritable, BytesWritable> from the WritableTypeFamily
to avoid the Avro restriction.  Tests all work.

If we went with this approach the one outstanding TODO I have in the code is closing out the
Consumer that gets created during materialize() or ReadableData.  I could make the iterator
close the Consumer once all is consumed but then that'd be single use for the Iterable and
is that ok?

> Create a KafkaSource
> --------------------
>
>                 Key: CRUNCH-606
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-606
>             Project: Crunch
>          Issue Type: New Feature
>          Components: IO
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-606-byteswritable.diff, CRUNCH-606.diff, CRUNCH-606.patch
>
>
> Pulling data out of Kafka is a common use case and some of the ways to do it Kafka Connect,
Camus, Gobblin do not integrate nicely with existing processing pipelines like Crunch.  With
Kafka 0.9, the consuming API is a lot easier so we should build a Source implementation that
can read from Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message