crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CRUNCH-611) Simplified Kafka Offset Management in HDFS
Date Wed, 13 Jul 2016 15:21:20 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Micah Whitacre updated CRUNCH-611:
----------------------------------
    Attachment: CRUNCH-611.patch

So this patch provides a basic API for reading/writing Kafka offsets.  It then also provides
a simple implementation that reads/writes the values from HDFS.  In theory this then should
make regularly schedule Crunch pipeline's easier to implement with regard to offset management.

I did add a few optional dependencies so hopefully these won't cause too bad of conflicts
with the Hadoop stack.  We aren't having a problem on our cluster but didn't universally check.
 We are also setting out classpath first and running through Oozie so that changes classpath
ordering as well.

> Simplified Kafka Offset Management in HDFS
> ------------------------------------------
>
>                 Key: CRUNCH-611
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-611
>             Project: Crunch
>          Issue Type: Improvement
>          Components: IO
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-611.patch
>
>
> With the KafkaSource the responsibility of offset management is the burden of the consumer.
 With some simple APIs it is actually trivial to support read/storing these offsets in an
HDFS directory as checkpoints for the source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message