spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sirisha (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-23685) Spark Structured Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction)
Date Fri, 16 Mar 2018 03:48:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401428#comment-16401428
] 

sirisha edited comment on SPARK-23685 at 3/16/18 3:47 AM:
----------------------------------------------------------

[~apachespark] Can anyone please guide me on how to assign this story to myself?  I do not
see an option to assign it to myself.


was (Author: sindiri):
[~apachespark] Can anyone please guide me on how to assign this pull request to myself? 
I do not see an option to assign it to myself.

> Spark Structured Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e.
Log Compaction)
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23685
>                 URL: https://issues.apache.org/jira/browse/SPARK-23685
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.2.0
>            Reporter: sirisha
>            Priority: Major
>
> When Kafka does log compaction offsets often end up with gaps, meaning the next requested
offset will be frequently not be offset+1. The logic in KafkaSourceRDD & CachedKafkaConsumer assumes
that the next offset will always be just an increment of 1 .If not, it throws the below exception:
>  
> "Cannot fetch records in [5589, 5693) (GroupId: XXX, TopicPartition:XXXX). Some data
may have been lost because they are not available in Kafka any more; either the data was aged
out by Kafka or the topic may have been deleted before all the data in the topic was processed.
If you don't want your streaming query to fail on such cases, set the source option "failOnDataLoss"
to "false". "
>  
> FYI: This bug is related to https://issues.apache.org/jira/browse/SPARK-17147
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message