crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CRUNCH-606) Create a KafkaSource
Date Mon, 09 May 2016 19:55:13 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276901#comment-15276901
] 

Micah Whitacre edited comment on CRUNCH-606 at 5/9/16 7:54 PM:
---------------------------------------------------------------

Found another disconnect I'm running into as well.

{noformat}
ava.lang.ClassCastException: org.apache.avro.mapred.AvroWrapper cannot be cast to org.apache.hadoop.io.NullWritable
	at org.apache.crunch.types.avro.AvroKeyConverter.convertInput(AvroKeyConverter.java:25)
{noformat}

Since an normal AvroInputFormat returns <AvroWrapper<T>, NullWritable> that bled
into the AvroKeyConverter with expects the same.  So while right now the KafkaRecordReader
is returning <K, V> for it to actually fit the AvroKeyConverter it should returning
Pair<K, V>.  Or more specifically it should be <AvroWrapper<Pair<K,V>>,
NullWritable>.  Not great that it the converter is putting restrictions on the input format
but I can possibly tweak the input format/record reader to work around it.


was (Author: mkwhitacre):
Found another disconnect I'm running into as well.

{noformat}
ava.lang.ClassCastException: org.apache.avro.mapred.AvroWrapper cannot be cast to org.apache.hadoop.io.NullWritable
	at org.apache.crunch.types.avro.AvroKeyConverter.convertInput(AvroKeyConverter.java:25)
{noformat}

Since an normal AvroInputFormat returns <AvroWrapper<T>, NullWritable> that bled
into the AvroKeyConverter with expects the same.  So while right now the KafkaRecordReader
is returning <K, V> for it to actually fit the AvroKeyConverter it should returning
Pair<K, V>.  Or more specifically it should be <AvroWrapper<Pair<K,V>>,
NullWritable>.  Not great that it the converter is putting restrictions on the input format
but I can possibly work around it.

> Create a KafkaSource
> --------------------
>
>                 Key: CRUNCH-606
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-606
>             Project: Crunch
>          Issue Type: New Feature
>          Components: IO
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-606.diff, CRUNCH-606.patch
>
>
> Pulling data out of Kafka is a common use case and some of the ways to do it Kafka Connect,
Camus, Gobblin do not integrate nicely with existing processing pipelines like Crunch.  With
Kafka 0.9, the consuming API is a lot easier so we should build a Source implementation that
can read from Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message