From issues-return-81535-archive-asf-public=cust-asf.ponee.io@nifi.apache.org Sat Jul 20 13:20:02 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 20E38180684 for ; Sat, 20 Jul 2019 15:20:02 +0200 (CEST) Received: (qmail 23931 invoked by uid 500); 20 Jul 2019 13:20:01 -0000 Mailing-List: contact issues-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nifi.apache.org Delivered-To: mailing list issues@nifi.apache.org Received: (qmail 23820 invoked by uid 99); 20 Jul 2019 13:20:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Jul 2019 13:20:01 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 70931E2EF7 for ; Sat, 20 Jul 2019 13:20:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1A4F6265B3 for ; Sat, 20 Jul 2019 13:20:00 +0000 (UTC) Date: Sat, 20 Jul 2019 13:20:00 +0000 (UTC) From: "Alessandro D'Armiento (JIRA)" To: issues@nifi.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (NIFI-6462) ListHDFS should be triggerable MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/NIFI-6462?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro D'Armiento updated NIFI-6462: ---------------------------------------- Description:=20 h2. Current Situation ListHDFS is designed to be (only) the entry point of a data integration pip= eline, and therefore can only be triggered on a cron or time base. h2. Improvement Proposal ListHDFS should be able to be used as part of your pipeline even if you do = not expect to have it as the entry point. To obtain it: * It has to be triggerable * Trigger flowfile should be able to bring the listing directory as an att= ribute * Some logic, such as the "skip the last file in the listing directory" sh= ould be made optional * Since the processor will work on a 1:N semantic (1 input trigger flowfil= e, N output flowfiles) it would be nice to support fragmentation attributes= (for example for subsequent merge operations) ** It would be also useful to support different fragmentation strategies, = in order to support multiple user cases. For example, it should be possible= to select: *** A "one for all" fragmentation strategy which will create a single frag= mentation group. Therefore, all files will have the same fragment.identifie= r, the same fragment.count, equal to the total number N of listed files, an= d fragment.index =E2=88=88 [0, N). *** A "per subdir" fragmentation strategy which will create different frag= mentation groups, one for each scanned subdirectory of the given path. Ther= efore, for each subfolder, flowfiles will have a specific fragment.identifi= er, fragment.count will be, for each flowfile, equal to the number Ni of fi= les in the i-th directory, and fragment.index =E2=88=88 [0, Ni). was: h2. Current Situation ListHDFS is designed to be (only) the entry point of a data integration pip= eline, and therefore can only be triggered on a cron or time base. h2. Improvement Proposal ListHDFS should be able to be used as part of your pipeline even if you do = not expect to have it as the entry point. To obtain it: * It has to be triggerable * Trigger flowfile should be able to bring the listing directory as an att= ribute * Some logic, such as the "skip the last file in the listing directory" sh= ould be made optional * Since the processor will work on a 1:N semantic (1 input trigger flowfil= e, N output flowfiles) it would be nice to support fragmentation attributes= (for example for subsequent merge operations) * It would be also useful to support different fragmentation strategies, i= n order to support multiple user cases. For example, it should be possible = to select: * A "one for all" fragmentation strategy which will create a single fragme= ntation group. Therefore, all files will have the same fragment.identifier,= the same fragment.count, equal to the total number N of listed files, and = fragment.index =E2=88=88 [0, N). * A "per subdir" fragmentation strategy which will create different fragme= ntation groups, one for each scanned subdirectory of the given path. Theref= ore, for each subfolder, flowfiles will have a specific fragment.identifier= , fragment.count will be, for each flowfile, equal to the number Ni of file= s in the i-th directory, and fragment.index =E2=88=88 [0, Ni). > ListHDFS should be triggerable > ------------------------------ > > Key: NIFI-6462 > URL: https://issues.apache.org/jira/browse/NIFI-6462 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Affects Versions: 1.9.2 > Reporter: Alessandro D'Armiento > Priority: Minor > > h2. Current Situation > ListHDFS is designed to be (only) the entry point of a data integration p= ipeline, and therefore can only be triggered on a cron or time base. > h2. Improvement Proposal > ListHDFS should be able to be used as part of your pipeline even if you d= o not expect to have it as the entry point. To obtain it: > * It has to be triggerable > * Trigger flowfile should be able to bring the listing directory as an a= ttribute > * Some logic, such as the "skip the last file in the listing directory" = should be made optional > * Since the processor will work on a 1:N semantic (1 input trigger flowf= ile, N output flowfiles) it would be nice to support fragmentation attribut= es (for example for subsequent merge operations) > ** It would be also useful to support different fragmentation strategies= , in order to support multiple user cases. For example, it should be possib= le to select: > *** A "one for all" fragmentation strategy which will create a single fr= agmentation group. Therefore, all files will have the same fragment.identif= ier, the same fragment.count, equal to the total number N of listed files, = and fragment.index =E2=88=88 [0, N). > *** A "per subdir" fragmentation strategy which will create different fr= agmentation groups, one for each scanned subdirectory of the given path. Th= erefore, for each subfolder, flowfiles will have a specific fragment.identi= fier, fragment.count will be, for each flowfile, equal to the number Ni of = files in the i-th directory, and fragment.index =E2=88=88 [0, Ni). -- This message was sent by Atlassian JIRA (v7.6.14#76016)