spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "caolan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-17147) Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction)
Date Mon, 12 Dec 2016 04:08:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15740906#comment-15740906
] 

caolan commented on SPARK-17147:
--------------------------------

I am using spark 2.0.0 + kafka 0.10 + compact mode topics even in some production environment.
 This fix is really important. so the question is how to decide it is important or not. Compact
kafka topic should be widely used now, spark 2.0 should support it well.

For the other issue, it did not happen all the time, did not have regular pattern, several
times one day or did not happen in several days.
so  I should enlarge the spark.streaming.kafka.consumer.poll.ms, right.

> Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction)
> ----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-17147
>                 URL: https://issues.apache.org/jira/browse/SPARK-17147
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.0.0
>            Reporter: Robert Conrad
>
> When Kafka does log compaction offsets often end up with gaps, meaning the next requested
offset will be frequently not be offset+1. The logic in KafkaRDD & CachedKafkaConsumer
has a baked in assumption that the next offset will always be just an increment of 1 above
the previous offset. 
> I have worked around this problem by changing CachedKafkaConsumer to use the returned
record's offset, from:
> {{nextOffset = offset + 1}}
> to:
> {{nextOffset = record.offset + 1}}
> and changed KafkaRDD from:
> {{requestOffset += 1}}
> to:
> {{requestOffset = r.offset() + 1}}
> (I also had to change some assert logic in CachedKafkaConsumer).
> There's a strong possibility that I have misconstrued how to use the streaming kafka
consumer, and I'm happy to close this out if that's the case. If, however, it is supposed
to support non-consecutive offsets (e.g. due to log compaction) I am also happy to contribute
a PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message