Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id BD077200CA8 for ; Thu, 15 Jun 2017 19:47:25 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BB939160BDF; Thu, 15 Jun 2017 17:47:25 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DB459160BC9 for ; Thu, 15 Jun 2017 19:47:24 +0200 (CEST) Received: (qmail 85651 invoked by uid 500); 15 Jun 2017 17:47:23 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 85639 invoked by uid 99); 15 Jun 2017 17:47:23 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Jun 2017 17:47:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id B80951A79F2 for ; Thu, 15 Jun 2017 17:47:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.601 X-Spam-Level: X-Spam-Status: No, score=-1.601 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-2.8, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=rsa.com header.b=I84Kup75; dkim=pass (1024-bit key) header.d=rsa.com header.b=brdYbYuS Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id rP2VFTiG9MM0 for ; Thu, 15 Jun 2017 17:47:19 +0000 (UTC) Received: from esa2.dell-outbound.iphmx.com (esa2.dell-outbound.iphmx.com [68.232.149.220]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 742115F6C6 for ; Thu, 15 Jun 2017 17:47:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=rsa.com; i=@rsa.com; q=dns/txt; s=jan2013; t=1497548839; x=1529084839; h=from:to:subject:date:message-id:mime-version; bh=HHDsdCYAANMLxJHF58nSfqYDi2S24HNdoyMAaoF7LV4=; b=I84Kup75IPP+6fUcYyA/NmbpFQEr1mUXOnadD0i7yukgszJGt6QYP3uO rLVHCOZbj3/5e+Bdyo7uPN6VUFemXEtgHf7Zac9rGIQvuIwousUbgf4sa FOEFi4anwhxLqcLLuJSBsH1U2VO2SujZY1DVKyQgiJOnE/wnHi3QsDli4 A=; Received: from esa6.dell-outbound2.iphmx.com ([68.232.154.99]) by esa2.dell-outbound.iphmx.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 Jun 2017 12:47:06 -0500 Received: from mailuogwdur.emc.com ([128.221.224.79]) by esa6.dell-outbound2.iphmx.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 Jun 2017 23:47:00 +0600 Received: from maildlpprd55.lss.emc.com (maildlpprd55.lss.emc.com [10.106.48.159]) by mailuogwprd52.lss.emc.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.0) with ESMTP id v5FHkxN1004368 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Thu, 15 Jun 2017 13:47:00 -0400 X-DKIM: OpenDKIM Filter v2.4.3 mailuogwprd52.lss.emc.com v5FHkxN1004368 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=rsa.com; s=jan2013; t=1497548820; bh=KL83qjaLSMYMssQ6+H2+08Tbodg=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=brdYbYuSI6pdrwmZqJZMCZ+CL9hKCCQaLXP3t6Xd35Jh+JhbofbsFQ4HFWb2tdFWx +0ItSBIdqGoBk7oMbZpNbbLxz2fySUnDLeIoon9bSb3hkKPBjK6E0KGkuK7NNroRNY IV8eWTv80kA2D2I37HiOruqyCgvuCoCgC01kTYD8= X-DKIM: OpenDKIM Filter v2.4.3 mailuogwprd52.lss.emc.com v5FHkxN1004368 Received: from mailusrhubprd01.lss.emc.com (mailusrhubprd01.lss.emc.com [10.253.24.19]) by maildlpprd55.lss.emc.com (RSA Interceptor) for ; Thu, 15 Jun 2017 13:46:33 -0400 Received: from MXHUB318.corp.emc.com (MXHUB318.corp.emc.com [10.146.3.96]) by mailusrhubprd01.lss.emc.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.0) with ESMTP id v5FHkb5Y013898 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=FAIL) for ; Thu, 15 Jun 2017 13:46:38 -0400 Received: from MX302CL04.corp.emc.com ([fe80::18ad:6300:21eb:2b39]) by MXHUB318.corp.emc.com ([10.146.3.96]) with mapi id 14.03.0352.000; Thu, 15 Jun 2017 13:46:37 -0400 From: "Mendelson, Assaf" To: "dev@spark.apache.org" Subject: structured streaming documentation does not match behavior Thread-Topic: structured streaming documentation does not match behavior Thread-Index: AdLl9R2pqQq5iK4OQZWhKK/UtENXTA== Date: Thu, 15 Jun 2017 17:46:37 +0000 Message-ID: <4E2D62B698F0814BB1E367C9F243AB801216B5@MX302CL04.corp.emc.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.105.8.135] Content-Type: multipart/alternative; boundary="_000_4E2D62B698F0814BB1E367C9F243AB801216B5MX302CL04corpemcc_" MIME-Version: 1.0 X-Sentrion-Hostname: mailusrhubprd01.lss.emc.com X-RSA-Classifications: public archived-at: Thu, 15 Jun 2017 17:47:25 -0000 --_000_4E2D62B698F0814BB1E367C9F243AB801216B5MX302CL04corpemcc_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi, I have started to play around with structured streaming and it seems the do= cumentation (structured streaming programming guide) does not match the act= ual behavior I am seeing. It says in the documentation that maxFilesPerTrigger (as well as latestFirs= t) are options for the File sink. However, in fact, at least maxFilesPerTri= gger does not seem to have any real effect. On the other hand, the streamin= g source (readStream) which has no documentation for this option, does limi= t the number of files. This behavior actually makes more sense than the documentation as I expect = the file reader to define how to read files rather than the sink (e.g. if I= would use a kafka sink or foreach sink, they should still get the same beh= avior from the reading). Thanks, Assaf. --_000_4E2D62B698F0814BB1E367C9F243AB801216B5MX302CL04corpemcc_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi,

I have started to play around with structured stream= ing and it seems the documentation (structured streaming programming guide)= does not match the actual behavior I am seeing.

It says in the documentation that maxFilesPerTrigger= (as well as latestFirst) are options for the File sink. However, in fact, = at least maxFilesPerTrigger does not seem to have any real effect. On the o= ther hand, the streaming source (readStream) which has no documentation for this option, does limit the number of files= .

This behavior actually makes more sense than the doc= umentation as I expect the file reader to define how to read files rather t= han the sink (e.g. if I would use a kafka sink or foreach sink, they should= still get the same behavior from the reading).

 

Thanks,

        &nbs= p;     Assaf.

 

--_000_4E2D62B698F0814BB1E367C9F243AB801216B5MX302CL04corpemcc_--