crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-606) Create a KafkaSource
Date Mon, 09 May 2016 19:55:12 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276901#comment-15276901
] 

Micah Whitacre commented on CRUNCH-606:
---------------------------------------

Found another disconnect I'm running into as well.

{noformat}
ava.lang.ClassCastException: org.apache.avro.mapred.AvroWrapper cannot be cast to org.apache.hadoop.io.NullWritable
	at org.apache.crunch.types.avro.AvroKeyConverter.convertInput(AvroKeyConverter.java:25)
{noformat}

Since an normal AvroInputFormat returns <AvroWrapper<T>, NullWritable> that bled
into the AvroKeyConverter with expects the same.  So while right now the KafkaRecordReader
is returning <K, V> for it to actually fit the AvroKeyConverter it should returning
Pair<K, V>.  Or more specifically it should be <AvroWrapper<Pair<K,V>>,
NullWritable>.  Not great that it the converter is putting restrictions on the input format
but I can possibly work around it.

> Create a KafkaSource
> --------------------
>
>                 Key: CRUNCH-606
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-606
>             Project: Crunch
>          Issue Type: New Feature
>          Components: IO
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-606.diff, CRUNCH-606.patch
>
>
> Pulling data out of Kafka is a common use case and some of the ways to do it Kafka Connect,
Camus, Gobblin do not integrate nicely with existing processing pipelines like Crunch.  With
Kafka 0.9, the consuming API is a lot easier so we should build a Source implementation that
can read from Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message