beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gareth Western <gar...@garethwestern.com>
Subject KafkaIO: reset topic for reading from the start with every run
Date Mon, 23 Jan 2017 18:10:16 GMT
Apologies if this is slightly off-Beam-topic.

I'm setting up a small demo for some colleagues to parse a 21-million 
line CSV file using a few different runners (Direct, Flink, and Google 
Dataflow). The TextIO using the direct runner seems a bit slow, so I was 
wondering if it's perhaps related to the IO. So I pushed the CSV to a 
Kafka topic (one line per entry). Now to test the topic I'd like to read 
from the start every time. I'm quite new to Kafka, so I'm not sure how 
to "reset" it. A bit of searching indicates that something like this for 
the ConsumerProperties should be correct:

         p.apply(KafkaIO.read()
             .withBootstrapServers(options.getBrokerUrl())
             .withTopics(ImmutableList.of(options.getTopic()))
             .withValueCoder(StringUtf8Coder.of())
             .updateConsumerProperties(
*                ImmutableMap.of(**
**                    ConsumerConfig.GROUP_ID_CONFIG, 
UUID.randomUUID().toString(),**
**                    ConsumerConfig.CLIENT_ID_CONFIG, "your_client_id",**
**ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"**
**                )*
             ))

But it didn't seem to work. Is there some other setting that I'm missing?

Kind regards,

Gareth


Mime
View raw message