nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro D'Armiento (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (NIFI-6462) ListHDFS should be triggerable
Date Sat, 20 Jul 2019 13:18:00 GMT

     [ https://issues.apache.org/jira/browse/NIFI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alessandro D'Armiento updated NIFI-6462:
----------------------------------------
    Priority: Minor  (was: Major)

> ListHDFS should be triggerable
> ------------------------------
>
>                 Key: NIFI-6462
>                 URL: https://issues.apache.org/jira/browse/NIFI-6462
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 1.9.2
>            Reporter: Alessandro D'Armiento
>            Priority: Minor
>
> h2. Current Situation
> ListHDFS is designed to be (only) the entry point of a data integration pipeline, and
therefore can only be triggered on a cron or time base. 
> h2. Improvement Proposal
> ListHDFS should be able to be used as part of your pipeline even if you do not expect
to have it as the entry point. To obtain it: 
> * It has to be triggerable
> * Trigger flowfile should be able to bring the listing directory as an attribute
> * Some logic, such as the "skip the last file in the listing directory" should be made
optional
> * Since the processor will work on a 1:N semantic (1 input trigger flowfile, N output
flowfiles) it would be nice to support fragmentation attributes (for example for subsequent
merge operations)
>   * It would be also useful to support different fragmentation strategies, in order to
support multiple user cases. For example, it should be possible to select:
>     *  A "one for all" fragmentation strategy which will create a single fragmentation
group. Therefore, all files will have the same fragment.identifier, the same fragment.count,
equal to the total number N of listed files, and fragment.index ∈ [0, N).
>     *  A "per subdir" fragmentation strategy which will create different fragmentation
groups, one for each scanned subdirectory of the given path. Therefore, for each subfolder,
flowfiles will have a specific fragment.identifier, fragment.count will be, for each flowfile,
equal to the number Ni of files in the i-th directory, and fragment.index ∈ [0, Ni).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message