nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy LoPresto <alopre...@apache.org>
Subject Re: Custom processors in MiNiFi
Date Fri, 27 Jan 2017 19:06:11 GMT
One other note; you may find additional help on our developers list - dev@nifi.apache.org.
This list is more focused on user issues and functionality, while that list gets much deeper
into the weeds on coding.

Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jan 27, 2017, at 11:04 AM, Andy LoPresto <alopresto@apache.org> wrote:
> 
> Hi Aakash.
> 
> Last summer I had an intern working for me who investigated using machine learning (unsupervised
anomaly detection using kNN and LOF) against NiFi provenance data to perform error identification
and build a processor recommendation engine. I can’t share the work as it is company internal,
but there is definitely a growing community and interest in what you’re discussing.
> 
> If you truly want to distribute the computational load of performing the analysis to
edge nodes, writing custom processors is likely a requirement. Can I make two suggestions
before you begin writing code, though? First, investigate if you could deploy something like
scikit-learn (Python) [1] or Apache Spark-ML [2] to reside alongside NiFi on the edge nodes
(obviously depends on HW resources). Our early efforts involved writing custom NiFi code,
but it turned out it was much easier to offload the data to scikit-learn and then ingest the
results back into NiFi to continue data flow, while leaving the computation to an external
system.
> 
> If you really want the computation to be running inside the NiFi JVM, also look at the
ExecuteScript processor before trying to write a custom processor. While NiFi makes it easy
to deploy custom code, the SDLC can provide a few constant delays — after you generate the
Maven pom for the NAR, you will have to write the code in an IDE, test it, compile, build
the NAR, drop it into the NiFi lib, and restart the entire application every time you make
a change. To prototype your model, I recommend using the ES processor, which will provide
immediate feedback. It also abstracts a lot of the boilerplate framework so you can hyper
focus on the domain work. Matt Burgess has written a number of great articles which should
get you up and running with it [3].
> 
> Once you have a model and computation you’re confident in, then it’s easy to translate
it to a dedicated custom processor and deploy it. I find this methodology saves me a lot of
time and a bit of frustration. Good luck. I’m very curious to see what your work yields.
> 
> [1] http://scikit-learn.org/stable/ <http://scikit-learn.org/stable/>
> [2] https://spark.apache.org/mllib/ <https://spark.apache.org/mllib/>
> [3] https://funnifi.blogspot.com <https://funnifi.blogspot.com/>
> 
> 
> 
> Andy LoPresto
> alopresto@apache.org <mailto:alopresto@apache.org>
> alopresto.apache@gmail.com <mailto:alopresto.apache@gmail.com>
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On Jan 27, 2017, at 5:45 AM, Aldrin Piri <aldrinpiri@gmail.com <mailto:aldrinpiri@gmail.com>>
wrote:
>> 
>> Hi, Aakash!
>> 
>> To my knowledge, I have not seen any discussion about such processors on the lists
specifically although have heard people mentioning assorted libraries that might be a good
fit for the NiFi ecosystem's intended purposes.  There has been some foundational work such
as the following issues which allow processors to make use of the state management features
in NiFi for the sake of managing the flow of data to do some higher level inspection/analysis.
>> 
>> https://issues.apache.org/jira/browse/NIFI-1582 <https://issues.apache.org/jira/browse/NIFI-1582>
>> https://issues.apache.org/jira/browse/NIFI-1682 <https://issues.apache.org/jira/browse/NIFI-1682>
>> https://issues.apache.org/jira/browse/NIFI-2590 <https://issues.apache.org/jira/browse/NIFI-2590>
>> 
>> If my understanding of your question is correct, I believe your notion of distribution
may not directly align with the intended focus of NiFi, but certainly could be some aspects
that work.  Would you be willing to expand in greater detail how you would envision such processors
interacting with data and possibly provide some of the libraries you were considering in your
initial message?
>> 
>> Thanks!
>> 
>> --aldrin
>> 
>> On Fri, Jan 27, 2017 at 7:38 AM, Aakash Khochare <aakhochare@grads.cds.iisc.ac.in
<mailto:aakhochare@grads.cds.iisc.ac.in>> wrote:
>> Greetings,
>> 
>> While I understand that the primary use of NiFi/MiNiFi is for secure data ingress
with the added benefit of Provenance, what are the views of the community on writing Processors
that implement Machine Learning Algorithms and distribute them across Edge+ Cloud using NiFi
and MiNiFi? Has anyone tried writing such processors?
>> 
>> Regards,
>> 
>> Aakash Khochare
>> 
>> 
>> 
> 


Mime
View raw message