hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Casey Green <cgr...@conductor.com>
Subject A more scalable Kafka to Hadoop InputFormat
Date Thu, 30 Oct 2014 14:32:25 GMT
Hi Folks,

I'm open sourcing a scalable Kafka InputFormat.  As far as I know or am aware of, my version
is unique compared to other Kafka InputFormats out there, in that input splits are mapped
to Kafka log files, rather than entire Kafka partitions.  Mapping Kafka log files to input
splits scales your Map/Reduce job by the amount of data left to consume in a queue, whereas
mapping input splits to entire partitions always gives you a constant number of input splits.

I wrote up a blog post about it here<http://www.conductor.com/nightlight/data-stream-processing-bulk-kafka-hadoop/>,
and the source code for my KafkaInputFormat is on github<https://github.com/Conductor/kangaroo>.
 Your questions, comments and feedback are welcomed and much appreciated!

Casey Green

View raw message