Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 50FD82004C8 for ; Mon, 9 May 2016 23:22:15 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4FA8B16099C; Mon, 9 May 2016 21:22:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 95BC0160A10 for ; Mon, 9 May 2016 23:22:14 +0200 (CEST) Received: (qmail 93731 invoked by uid 500); 9 May 2016 21:22:13 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 93498 invoked by uid 500); 9 May 2016 21:22:13 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 93458 invoked by uid 99); 9 May 2016 21:22:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 May 2016 21:22:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6D7BD2C14F4 for ; Mon, 9 May 2016 21:22:13 +0000 (UTC) Date: Mon, 9 May 2016 21:22:13 +0000 (UTC) From: "Micah Whitacre (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CRUNCH-606) Create a KafkaSource MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 09 May 2016 21:22:15 -0000 [ https://issues.apache.org/jira/browse/CRUNCH-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Whitacre updated CRUNCH-606: ---------------------------------- Attachment: CRUNCH-606-byteswritable.diff Ok went with the simplest version I could get working by the end of the day. The KafkaSource always produces PTableType from the WritableTypeFamily to avoid the Avro restriction. Tests all work. If we went with this approach the one outstanding TODO I have in the code is closing out the Consumer that gets created during materialize() or ReadableData. I could make the iterator close the Consumer once all is consumed but then that'd be single use for the Iterable and is that ok? > Create a KafkaSource > -------------------- > > Key: CRUNCH-606 > URL: https://issues.apache.org/jira/browse/CRUNCH-606 > Project: Crunch > Issue Type: New Feature > Components: IO > Reporter: Micah Whitacre > Assignee: Micah Whitacre > Attachments: CRUNCH-606-byteswritable.diff, CRUNCH-606.diff, CRUNCH-606.patch > > > Pulling data out of Kafka is a common use case and some of the ways to do it Kafka Connect, Camus, Gobblin do not integrate nicely with existing processing pipelines like Crunch. With Kafka 0.9, the consuming API is a lot easier so we should build a Source implementation that can read from Kafka. -- This message was sent by Atlassian JIRA (v6.3.4#6332)