nifi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teresa Jackson (JIRA)" <>
Subject [jira] [Commented] (NIFI-252) Enhancement to the Core to support metrics
Date Thu, 15 Jan 2015 21:05:35 GMT


Teresa Jackson commented on NIFI-252:

Yes. Thanks 


What would you think about adding a publish/subscriber interface to the Provenance repo? 
That way a client can subscribe to events in real time to do the necessary trending and metrics

Conceptually, here's what I'm thinking.

The local provenance repository pushes to a client every time an event occurs.

The service that sends events to this client in real time would also provide the flowfile
associated with that event.

<Do note that all processing past this point falls outside the Core, I'm just adding it
for contextual background.>

The client sends things to a database somewhere, then the "Aggregator" Processor can get back
a list of attributes that the user selected via that processor's configuration.  That processor
would then generate a flowFile whose content is the set of metrics, that flowFile can be sent
wherever for additional analysis.  Another processor would be used to support multiple data
formats (XML, JSON, proprietary formats, whatever).

I'm not too wrapped up in whether the 'Aggregator' has an output format attribute that can
be set, or if it outputs a flowfile that the CreateFormatProcessor ingests, and that processor
has the output format attribute. 

The point is that if this feature were implemented with a Publish/Subscriber interface, then
aggregation, summarization, and correlation of data for trending can occur.

The overall flow would look something like this:

Local NiFi Provenance repo -> Client -> Aggregator -> PutProcessor

> Enhancement to the Core to support metrics 
> -------------------------------------------
>                 Key: NIFI-252
>                 URL:
>             Project: Apache NiFi
>          Issue Type: Wish
>          Components: Core Framework
>    Affects Versions: 0.0.1
>            Reporter: Teresa Jackson
> I'd like to propose an addition or enhancement be made to the Core to support volume
management, trend analysis by way of databasing attributes and content so that it is query-able
and made available for display. This information would then be used for statistical roll ups,
metrics, trend analysis, etc..
> Ideally, we'd do it by capturing running totals by receiving copies of local provenance
events.  This component would be like local provenance in that it would retain the data for
some configurable period of time, based on the amount of disk space allocated for that process.
 In addition, these roll ups could be sent somewhere for even longer retention.
> The goal is to keep as many hooks as possible to making it possible for other programs/services
to ingest both the local provenance logs, and the rolled up summaries.  There's a growing
base of people who are comfortable with NIFI graphs, and local provenance, so I think that
it makes sense to build off that.
> The issue I'm facing is that Provenance is fine for tracking one file if you have a starting
point, but it is not designed to do counting, summarization and correlation of data. And it
doesn't support advanced queries.
> Here are some of the most immediate and pressing use cases for this design.
> 1.  How much traffic came in yesterday (or last week)?
> 2. Provide statistical counts on items of interest within a flow for a given flow/date
> 3.  When was the last file sent to "System X"?
> 4. Did anything get sent to "System Y"?
> 5. How much data was marked with a certain tag?
> 6. How much data was scanned?
> 7. How much data was detected?
> 8. How much of a particular type of data was received in bytes?
> 9. How much data was processed by file count?

This message was sent by Atlassian JIRA

View raw message