nifi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teresa Jackson (JIRA)" <j...@apache.org>
Subject [jira] [Created] (NIFI-252) Enhancement to the Core to support metrics
Date Tue, 13 Jan 2015 18:36:35 GMT
Teresa Jackson created NIFI-252:
-----------------------------------

             Summary: Enhancement to the Core to support metrics 
                 Key: NIFI-252
                 URL: https://issues.apache.org/jira/browse/NIFI-252
             Project: Apache NiFi
          Issue Type: Wish
          Components: Core Framework
    Affects Versions: 0.0.1
            Reporter: Teresa Jackson


I'd like to propose an addition or enhancement be made to the Core to support volume management,
trend analysis by way of databasing attributes and content so that it is query-able and made
available for display. This information would then be used for statistical roll ups, metrics,
trend analysis, etc..

Ideally, we'd do it by capturing running totals by receiving copies of local provenance events.
 This component would be like local provenance in that it would retain the data for some configurable
period of time, based on the amount of disk space allocated for that process.  In addition,
these roll ups could be sent somewhere for even longer retention.

The goal is to keep as many hooks as possible to making it possible for other programs/services
to ingest both the local provenance logs, and the rolled up summaries.  There's a growing
base of people who are comfortable with NIFI graphs, and local provenance, so I think that
it makes sense to build off that.

The issue I'm facing is that Provenance is fine for tracking one file if you have a starting
point, but it is not designed to do counting, summarization and correlation of data. And it
doesn't support advanced queries.

Here are some of the most immediate and pressing use cases for this design.

1.  How much traffic came in yesterday (or last week)?
2. Provide statistical counts on items of interest within a flow for a given flow/date range.
3.  When was the last file sent to "System X"?
4. Did anything get sent to "System Y"?
5. How much data was marked with a certain tag?
6. How much data was scanned?
7. How much data was detected?
8. How much of a particular type of data was received in bytes?
9. How much data was processed by file count?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message