nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NIFI-3484) GenerateTableFetch Should Allow for Right Boundary
Date Fri, 18 Aug 2017 03:20:00 GMT

    [ https://issues.apache.org/jira/browse/NIFI-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131659#comment-16131659
] 

ASF GitHub Bot commented on NIFI-3484:
--------------------------------------

Github user patricker commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2091#discussion_r133872895
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java
---
    @@ -112,6 +112,17 @@
                 .addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
                 .build();
     
    +    public static final PropertyDescriptor RIGHT_BOUND_WHERE = new PropertyDescriptor.Builder()
    --- End diff --
    
    I am OK with proceeding that way, though I'd feel better if I knew how many databases
this has been tested on. When I wrote it my focus was on one, relatively uncommon (SAP HANA)
system. It tests out fine, but I just worry about making it the default.


> GenerateTableFetch Should Allow for Right Boundary
> --------------------------------------------------
>
>                 Key: NIFI-3484
>                 URL: https://issues.apache.org/jira/browse/NIFI-3484
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Core Framework
>    Affects Versions: 1.2.0
>            Reporter: Peter Wicks
>            Assignee: Peter Wicks
>            Priority: Minor
>
> When using GenerateTableFetch it places no right hand boundary on pages of data.  This
can lead to issues when the statement says to get the next 1000 records greater then a specific
key, but records were added to the table between the time the processor executed and when
the SQL is being executed. As a result it pulls in records that did not exist when the processor
was run.  On the next execution of the processor these records will be pulled in a second
time.
> Example:
> Partition Size = 1000
> First run (no state): Count(*)=4700 and MAX(ID)=4700.
> 5 FlowFiles are generated, the last one will say to fetch 1000, not 700. (But I don't
think this is really a bug, just an observation).
> 5 Flow Files are now in queue to be executed by ExecuteSQL.  Before the 5th file can
execute 400 new rows are added to the table.  When the final SQL statement is executed 300
extra records, with higher ID values, will also be pulled into NiFi.
> Second run (state: ID=4700).  Count(*) ID>4700 = 400 and MAX(ID)=5100.
> 1 Flow File is generated, but includes 300 records already pulled into NiFI.
> The solution is to have an optional property that will let users use the new MAX(ID)
as a right boundary when generating queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message