Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C6088200BBE for ; Fri, 11 Nov 2016 19:38:16 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id C4866160AF6; Fri, 11 Nov 2016 18:38:16 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 19CE5160AEE for ; Fri, 11 Nov 2016 19:38:15 +0100 (CET) Received: (qmail 11418 invoked by uid 500); 11 Nov 2016 18:38:15 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 11402 invoked by uid 99); 11 Nov 2016 18:38:14 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Nov 2016 18:38:14 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id CE119E0362; Fri, 11 Nov 2016 18:38:14 +0000 (UTC) From: tcondie To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org Message-ID: Subject: [GitHub] spark pull request #15852: Spark 18187 Content-Type: text/plain Date: Fri, 11 Nov 2016 18:38:14 +0000 (UTC) archived-at: Fri, 11 Nov 2016 18:38:17 -0000 GitHub user tcondie opened a pull request: https://github.com/apache/spark/pull/15852 Spark 18187 ## What changes were proposed in this pull request? CompactibleFileStreamLog relys on "compactInterval" to detect a compaction batch. If the "compactInterval" is reset by user, CompactibleFileStreamLog will return wrong answer, resulting data loss. This PR procides a way to check the validity of 'compactInterval', and calculate an appropriate value. ## How was this patch tested? When restart a stream, we change the 'spark.sql.streaming.fileSource.log.compactInterval' different with the former one. The primary solution to this issue was given by @uncleGen Added extensions include an additional metadata field in OffsetSeq and CompactibleFileStreamLog APIs. @zsxwing You can merge this pull request into a Git repository by running: $ git pull https://github.com/tcondie/spark spark-18187 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15852.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15852 ---- commit 65395dddb505f6084db471430da1486d75a77e2a Author: genmao.ygm Date: 2016-11-09T08:21:09Z SPARK-18187: CompactibleFileStreamLog should not rely on "compactInterval" to detect a compaction batch commit d556933e0f039d661989e07f381aff185c9fac1b Author: genmao.ygm Date: 2016-11-09T08:24:53Z comment update commit 8b56f70b2dffd69dbc37007e923f3d5a56fce039 Author: genmao.ygm Date: 2016-11-09T08:34:11Z revert commit 4a7e28c4e372caa3b16b979273577bd6aa2c11f3 Author: genmao.ygm Date: 2016-11-09T08:35:13Z unit test - compacat metadata log change compactInterval from 4 to 5 commit 23e1baf454bde511ed1963a27f6492100823d496 Author: genmao.ygm Date: 2016-11-09T09:34:15Z bug fix: /zero commit 7d37e08026eaa1364e8a4fb10fb7cfb93cb51229 Author: Tyson Condie Date: 2016-11-11T00:50:02Z Merge branch 'SPARK-18187' of https://github.com/uncleGen/spark into spark-18187 commit d3f7bbf32d0debba24853a38eb48bfcdcdb517be Author: Tyson Condie Date: 2016-11-11T00:52:24Z Merge branch 'master' of https://github.com/apache/spark into spark-18187 commit 6901eacdddf235db4ba91a0903ce8826978d778a Author: Tyson Condie Date: 2016-11-11T18:16:41Z extend offset seq to include metadata ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org