crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Olson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-630) KafkaRecordReader keeps retrying to poll data when the offset is reset to latest offset
Date Mon, 12 Dec 2016 16:53:58 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742438#comment-15742438
] 

Andrew Olson commented on CRUNCH-630:
-------------------------------------

The current workaround for this bug is to set auto.offset.reset=earliest in the Kafka connection
properties when creating the KafkaSource (or alternatively org.apache.crunch.kafka.connection.properties.auto.offset.reset=earliest
in the Pipeline's Configuration).

We might consider making that a config override like the serializers [1], or at least flipping
the default from latest to earliest if it's not specified.

[1] https://github.com/apache/crunch/blob/master/crunch-kafka/src/main/java/org/apache/crunch/kafka/KafkaSource.java#L156-L165

> KafkaRecordReader keeps retrying to poll data when the offset is reset to latest offset
> ---------------------------------------------------------------------------------------
>
>                 Key: CRUNCH-630
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-630
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Pooja Dhondge
>
> We recently saw this behavior where, if the offset it is trying to read from doesn't
exist on Kafka due to retention policy, the offset gets reset to latest(default) and the KafkaRecordReader
keeps retrying beyond KAFKA_EMPTY_RETRY_ATTEMPTS_KEY
> {noformat}
> ...crunch.kafka.inputformat.KafkaRecordReader: No records retrieved but pending offsets
to consume therefore polling again. Attempt 17/10
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message