Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 39D0B200B7E for ; Tue, 6 Sep 2016 21:12:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 38AA6160ACE; Tue, 6 Sep 2016 19:12:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 87B3B160AA9 for ; Tue, 6 Sep 2016 21:12:21 +0200 (CEST) Received: (qmail 27313 invoked by uid 500); 6 Sep 2016 19:12:20 -0000 Mailing-List: contact issues-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nifi.apache.org Delivered-To: mailing list issues@nifi.apache.org Received: (qmail 27292 invoked by uid 99); 6 Sep 2016 19:12:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Sep 2016 19:12:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 9FA362C1B79 for ; Tue, 6 Sep 2016 19:12:20 +0000 (UTC) Date: Tue, 6 Sep 2016 19:12:20 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@nifi.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (NIFI-2735) Add processor to perform simple aggregations MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 06 Sep 2016 19:12:22 -0000 [ https://issues.apache.org/jira/browse/NIFI-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15468258#comment-15468258 ] ASF GitHub Bot commented on NIFI-2735: -------------------------------------- Github user olegz commented on the issue: https://github.com/apache/nifi/pull/988 Reviewing. . . > Add processor to perform simple aggregations > -------------------------------------------- > > Key: NIFI-2735 > URL: https://issues.apache.org/jira/browse/NIFI-2735 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions > Reporter: Matt Burgess > Assignee: Matt Burgess > > This is a proposal for a new processor (AggregateValues, for example) that can perform simple aggregation operations such as count, sum, average, min, max, and concatenate, over a set of "related" flow files. For example, when a JSON file is split on an array (using the SplitJson processor), the total count of the splits, the index of each split, and the unique indentifier (shared by each split) are stored as attributes in each flow file sent to the "splits" relationship: > https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitJson/index.html > These attributes are the "fragment.*" attributes in the documentation for SplitText, SplitXml, and SplitJson, for example. > Such a processor could perform these operations for each flow file split from the original document, and when all documents from a split have been processed, a flow file could be transferred to an "aggregate" relationship containing attributes for the operation, aggregate value, etc. > An interesting application of this (besides the actual aggregation operations) is that you can use the "aggregate" relationship as an event trigger. For example if you need to wait until all files from a group are processed, you can use AggregateValues and the "aggregate" relationship to indicate downstream that the entire group has been processed. If there is not a Split processor upstream, then the attributes (fragment.*) would have to be manipulated by the data flow designer, but this can be accomplished with other processors (including the scripting processors if necessary). -- This message was sent by Atlassian JIRA (v6.3.4#6332)