nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro D'Armiento (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (NIFI-6465) ListHDFS: skip last should be optional
Date Mon, 22 Jul 2019 12:56:00 GMT

     [ https://issues.apache.org/jira/browse/NIFI-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alessandro D'Armiento updated NIFI-6465:
----------------------------------------
    Description: 
h2. Current Situation

>From [official documentation|https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.9.2/org.apache.nifi.processors.hadoop.ListHDFS/index.html]

* Each time a listing is performed, the files with the latest timestamp will be excluded and
picked up during the next execution of the processor. This is done to ensure that we do not
miss any files, or produce duplicates, in the cases where files with the same timestamp are
written immediately before and after a single execution of the processor.

h2. Improvement Proposal

* If we are calling the ListHDFS only after a certain operation which populates an HDFS directory
has finished, it is pointless to skip the last file, and avoiding this behavior is tricky.
* A mandatory property "skip last" should be implemented in order to be able to actively decide
whether or not this behavior is necessary, based on the use case.
* This is also particularly useful in combination with [NIFI-6462]


  was:
h2. Current Situation

>From [official documentation|https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.9.2/org.apache.nifi.processors.hadoop.ListHDFS/index.html]

* Each time a listing is performed, the files with the latest timestamp will be excluded and
picked up during the next execution of the processor. This is done to ensure that we do not
miss any files, or produce duplicates, in the cases where files with the same timestamp are
written immediately before and after a single execution of the processor.

h2. Improvement Proposal

* If we are calling the ListHDFS only after a certain operation which populates an HDFS directory
has finished, it is pointless to skip the last file, and avoiding this behavior is tricky.
* A mandatory property "skip last" should be implemented in order to be able to actively decide
whether or not this behavior is necessary, based on the use case.
* This is also particularly useful in combination with [NIFI-6462]|https://issues.apache.org/jira/browse/NIFI-6462]



> ListHDFS: skip last should be optional
> --------------------------------------
>
>                 Key: NIFI-6465
>                 URL: https://issues.apache.org/jira/browse/NIFI-6465
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 1.9.2
>            Reporter: Alessandro D'Armiento
>            Priority: Minor
>
> h2. Current Situation
> From [official documentation|https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.9.2/org.apache.nifi.processors.hadoop.ListHDFS/index.html]
> * Each time a listing is performed, the files with the latest timestamp will be excluded
and picked up during the next execution of the processor. This is done to ensure that we do
not miss any files, or produce duplicates, in the cases where files with the same timestamp
are written immediately before and after a single execution of the processor.
> h2. Improvement Proposal
> * If we are calling the ListHDFS only after a certain operation which populates an HDFS
directory has finished, it is pointless to skip the last file, and avoiding this behavior
is tricky.
> * A mandatory property "skip last" should be implemented in order to be able to actively
decide whether or not this behavior is necessary, based on the use case.
> * This is also particularly useful in combination with [NIFI-6462]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message