spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tcondie <>
Subject [GitHub] spark pull request #15852: Spark 18187
Date Fri, 11 Nov 2016 18:38:14 GMT
GitHub user tcondie opened a pull request:

    Spark 18187

    ## What changes were proposed in this pull request?
    CompactibleFileStreamLog relys on "compactInterval" to detect a compaction batch. If the
"compactInterval" is reset by user, CompactibleFileStreamLog will return wrong answer, resulting
data loss. This PR procides a way to check the validity of 'compactInterval', and calculate
an appropriate value.
    ## How was this patch tested?
    When restart a stream, we change the 'spark.sql.streaming.fileSource.log.compactInterval'
different with the former one.
    The primary solution to this issue was given by @uncleGen 
    Added extensions include an additional metadata field in OffsetSeq and CompactibleFileStreamLog
APIs. @zsxwing 

You can merge this pull request into a Git repository by running:

    $ git pull spark-18187

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15852
commit 65395dddb505f6084db471430da1486d75a77e2a
Author: genmao.ygm <genmao.ygm@genmaoygmdemacbook-air.local>
Date:   2016-11-09T08:21:09Z

    SPARK-18187: CompactibleFileStreamLog should not rely on "compactInterval" to detect a
compaction batch

commit d556933e0f039d661989e07f381aff185c9fac1b
Author: genmao.ygm <genmao.ygm@genmaoygmdemacbook-air.local>
Date:   2016-11-09T08:24:53Z

    comment update

commit 8b56f70b2dffd69dbc37007e923f3d5a56fce039
Author: genmao.ygm <genmao.ygm@genmaoygmdemacbook-air.local>
Date:   2016-11-09T08:34:11Z


commit 4a7e28c4e372caa3b16b979273577bd6aa2c11f3
Author: genmao.ygm <genmao.ygm@genmaoygmdemacbook-air.local>
Date:   2016-11-09T08:35:13Z

    unit test - compacat metadata log
    change compactInterval from 4 to 5

commit 23e1baf454bde511ed1963a27f6492100823d496
Author: genmao.ygm <genmao.ygm@genmaoygmdemacbook-air.local>
Date:   2016-11-09T09:34:15Z

    bug fix: /zero

commit 7d37e08026eaa1364e8a4fb10fb7cfb93cb51229
Author: Tyson Condie <>
Date:   2016-11-11T00:50:02Z

    Merge branch 'SPARK-18187' of into spark-18187

commit d3f7bbf32d0debba24853a38eb48bfcdcdb517be
Author: Tyson Condie <>
Date:   2016-11-11T00:52:24Z

    Merge branch 'master' of into spark-18187

commit 6901eacdddf235db4ba91a0903ce8826978d778a
Author: Tyson Condie <>
Date:   2016-11-11T18:16:41Z

    extend offset seq to include metadata


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message