spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bhaskar E (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-23017) Why would spark-kafka stream fail stating `Got wrong record for <groupid> <topic> <partition> even after seeking to offset #` when using kafka API to commit offset
Date Tue, 09 Jan 2018 23:48:03 GMT

     [ https://issues.apache.org/jira/browse/SPARK-23017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bhaskar E updated SPARK-23017:
------------------------------
    Description: 
My spark-kafka streaming job started failing after multiple messages stating - `Got wrong
record for <groupid> <topic> <partition> even after seeking to offset #
`.

I disabled `enable.auto.commit` and saving the commits (to kafka itself) manually using kafka
API
{code}((CanCommitOffsets) messages.inputDStream()).commitAsync(offsetRanges.get());{code}

When I'm manually commit offsets to kafka and my job resumes requesting (kafka) (say after
1 hr recovering from some failure) for data then kafka should send the next available offsets
(from last committed offset). 
So, when I'm using kafka itself to store my committed offsets then my spark job clearly doesn't
know what's the next offset to request. But, here in the error message it states that it `Got
wrong record ....  even after seeking to a particular offset #`. *So, how is this possible?*

If I assume that the spark-driver gets some offsets ahead from kafka (before initially reading
the actual records) and then start requesting for the offsets even then it's confusing how
could spark receive wrong offset when it is requesting for the offsets which it got from kafka
itself in the first place?

  was:
My spark-kafka streaming job started failing after multiple messages stating - `Got wrong
record for <groupid> <topic> <partition> even after seeking to offset #
`.

I disabled `enable.auto.commit` and saving the commits (to kafka itself) manually using kafka
API
{code}((CanCommitOffsets) messages.inputDStream()).commitAsync(offsetRanges.get());{code}

When I'm manually commit offsets to kafka and my job resumes requesting (kafka) (say after
1 hr recovering from some failure) for data then kafka should send the next available offsets
(from last committed offset). 
So, when I'm using kafka itself to store my committed offsets then my spark job clearly doesn't
know what's the next offset to request. But, here in the error message it states that it `Got
wrong record ....  even after seeking to a particular offset #`. **So, how is this possible?**

If I assume that the spark-driver gets some offsets ahead from kafka (before initially reading
the actual records) and then start requesting for the offsets even then it's confusing how
could spark receive wrong offset when it is requesting for the offsets which it got from kafka
itself in the first place?


> Why would spark-kafka stream fail stating `Got wrong record for <groupid> <topic>
<partition> even after seeking to offset #` when using kafka API to commit offset
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23017
>                 URL: https://issues.apache.org/jira/browse/SPARK-23017
>             Project: Spark
>          Issue Type: Question
>          Components: Structured Streaming
>    Affects Versions: 2.2.1
>            Reporter: Bhaskar E
>            Priority: Minor
>
> My spark-kafka streaming job started failing after multiple messages stating - `Got wrong
record for <groupid> <topic> <partition> even after seeking to offset #
`.
> I disabled `enable.auto.commit` and saving the commits (to kafka itself) manually using
kafka API
> {code}((CanCommitOffsets) messages.inputDStream()).commitAsync(offsetRanges.get());{code}
> When I'm manually commit offsets to kafka and my job resumes requesting (kafka) (say
after 1 hr recovering from some failure) for data then kafka should send the next available
offsets (from last committed offset). 
> So, when I'm using kafka itself to store my committed offsets then my spark job clearly
doesn't know what's the next offset to request. But, here in the error message it states that
it `Got wrong record ....  even after seeking to a particular offset #`. *So, how is this
possible?*
> If I assume that the spark-driver gets some offsets ahead from kafka (before initially
reading the actual records) and then start requesting for the offsets even then it's confusing
how could spark receive wrong offset when it is requesting for the offsets which it got from
kafka itself in the first place?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message