Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5A278200CD8 for ; Wed, 2 Aug 2017 15:57:16 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 588F4169836; Wed, 2 Aug 2017 13:57:16 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9B08D169832 for ; Wed, 2 Aug 2017 15:57:15 +0200 (CEST) Received: (qmail 80911 invoked by uid 500); 2 Aug 2017 13:57:14 -0000 Mailing-List: contact issues-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nifi.apache.org Delivered-To: mailing list issues@nifi.apache.org Received: (qmail 80902 invoked by uid 99); 2 Aug 2017 13:57:14 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Aug 2017 13:57:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 189A5C3969 for ; Wed, 2 Aug 2017 13:57:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.001 X-Spam-Level: X-Spam-Status: No, score=-100.001 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id kpalcE5p5XQm for ; Wed, 2 Aug 2017 13:57:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id AD0A161B3C for ; Wed, 2 Aug 2017 13:57:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 72DCDE0DFA for ; Wed, 2 Aug 2017 13:57:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 9E0FF24655 for ; Wed, 2 Aug 2017 13:57:00 +0000 (UTC) Date: Wed, 2 Aug 2017 13:57:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@nifi.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (NIFI-3484) GenerateTableFetch Should Allow for Right Boundary MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 02 Aug 2017 13:57:16 -0000 [ https://issues.apache.org/jira/browse/NIFI-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110950#comment-16110950 ] ASF GitHub Bot commented on NIFI-3484: -------------------------------------- Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/1513 Hey @patricker, thanks for this PR and sorry it took so long to get into it. I just reviewed your work and it looks valid. I was able to confirm that it fixes the data duplication issue I was seeing in my environment. I have pushed a commit here: https://github.com/pvillard31/nifi/tree/PR1513 It fixes a check style issue in your PR and also adds a unit test to show the existing data duplication possibility. If you agree with it, can you add it into your PR and I'll get everything merged? Thanks a lot! > GenerateTableFetch Should Allow for Right Boundary > -------------------------------------------------- > > Key: NIFI-3484 > URL: https://issues.apache.org/jira/browse/NIFI-3484 > Project: Apache NiFi > Issue Type: New Feature > Components: Core Framework > Affects Versions: 1.2.0 > Reporter: Peter Wicks > Assignee: Peter Wicks > Priority: Minor > > When using GenerateTableFetch it places no right hand boundary on pages of data. This can lead to issues when the statement says to get the next 1000 records greater then a specific key, but records were added to the table between the time the processor executed and when the SQL is being executed. As a result it pulls in records that did not exist when the processor was run. On the next execution of the processor these records will be pulled in a second time. > Example: > Partition Size = 1000 > First run (no state): Count(*)=4700 and MAX(ID)=4700. > 5 FlowFiles are generated, the last one will say to fetch 1000, not 700. (But I don't think this is really a bug, just an observation). > 5 Flow Files are now in queue to be executed by ExecuteSQL. Before the 5th file can execute 400 new rows are added to the table. When the final SQL statement is executed 300 extra records, with higher ID values, will also be pulled into NiFi. > Second run (state: ID=4700). Count(*) ID>4700 = 400 and MAX(ID)=5100. > 1 Flow File is generated, but includes 300 records already pulled into NiFI. > The solution is to have an optional property that will let users use the new MAX(ID) as a right boundary when generating queries. -- This message was sent by Atlassian JIRA (v6.4.14#64029)