crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <>
Subject [jira] [Updated] (CRUNCH-611) Simplified Kafka Offset Management in HDFS
Date Wed, 13 Jul 2016 15:21:20 GMT


Micah Whitacre updated CRUNCH-611:
    Attachment: CRUNCH-611.patch

So this patch provides a basic API for reading/writing Kafka offsets.  It then also provides
a simple implementation that reads/writes the values from HDFS.  In theory this then should
make regularly schedule Crunch pipeline's easier to implement with regard to offset management.

I did add a few optional dependencies so hopefully these won't cause too bad of conflicts
with the Hadoop stack.  We aren't having a problem on our cluster but didn't universally check.
 We are also setting out classpath first and running through Oozie so that changes classpath
ordering as well.

> Simplified Kafka Offset Management in HDFS
> ------------------------------------------
>                 Key: CRUNCH-611
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: IO
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-611.patch
> With the KafkaSource the responsibility of offset management is the burden of the consumer.
 With some simple APIs it is actually trivial to support read/storing these offsets in an
HDFS directory as checkpoints for the source.

This message was sent by Atlassian JIRA

View raw message