crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-606) Create a KafkaSource
Date Sat, 07 May 2016 19:06:12 GMT


Josh Wills commented on CRUNCH-606:

So my first thought would be to delegate the deserialization to the PType logic-- have the
KafkaInputFormat always return instances of BytesWritable/ByteBuffer for the keys/values,
and leave it up to the PType that was passed in to the KafkaSource to handle mapping those
bytes into the appropriate type, with some helper functions along the lines that we put into
the PTypes class. An AvroType is always going to expect an AvroWrapper for any Avro-based
input format, so it may be the case that a WritableTypeFamily/BytesWritable as the base for
the KafkaSource is the way to go, even when the bytes themselves are serialized with Avro.

> Create a KafkaSource
> --------------------
>                 Key: CRUNCH-606
>                 URL:
>             Project: Crunch
>          Issue Type: New Feature
>          Components: IO
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-606.patch
> Pulling data out of Kafka is a common use case and some of the ways to do it Kafka Connect,
Camus, Gobblin do not integrate nicely with existing processing pipelines like Crunch.  With
Kafka 0.9, the consuming API is a lot easier so we should build a Source implementation that
can read from Kafka.

This message was sent by Atlassian JIRA

View raw message