nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Burgess <mattyb...@gmail.com>
Subject Re: How can we validate a data flow.?
Date Tue, 16 Aug 2016 00:12:34 GMT
I've used a handful of techniques/scripts to get the data into an analysis tool:

1) NiFi REST API to get the Provenance data
2) Groovy script to transform into Apache Tinkerpop 3.x format (can
provide if desired)
3) Gremlin script to get the Tinkerpop file into Neo4J (can't find it
but probably a couple of lines of Gremlin code)
4) Cypher queries at Neo4J for analysis

The first two should be able to be ported to Apache NiFi proper, with
the SiteToSiteProvenanceReportingTask and ExecuteScript. The third
step might be a sticky wicket, but can be done with ExecuteScript,
adding Module Directories pointing to a Gremlin client install.
Depending on the target graph DB chosen, step 4 becomes the analysis
task at the reporting engine (Neo4J, Titan, OrientDB, e.g.). If graph
traversal/analysis of provenance data is desired, please feel free to
open feature Jiras to cover this (PutGraphSON, ExecuteGremlin, e.g.)
or we can discuss and I'll help with Jiras, etc. as needed.

If you are less interested in lineage and more interested in the
events themselves (OLAP e.g.), you can use the REST API to get the
same tabular information that is shown on the initial provenance page,
and perhaps normalize that into a star schema or something, for
querying later using SQL / Data Warehouse / OLAP techniques.

Regards,
Matt

On Mon, Aug 15, 2016 at 3:22 PM, saikrishnat <saikrishnat@gmail.com> wrote:
> Hi,
> But still it would be very helpful if we have a reporting app or log on top
> of it. it is very hard to go in to provenance and click on each activity to
> see the details. and also if i want to check how did my flow run over the
> last week. it would be very difficult if i have hundreds of coming in.
> I was hoping someone gone thru similar situations and hoping to see how they
> did the tests.
>
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/How-can-we-validate-a-data-flow-tp13036p13046.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Mime
View raw message