From issues-return-96753-archive-asf-public=cust-asf.ponee.io@nifi.apache.org Mon May 4 17:08:02 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 14DC1180665 for ; Mon, 4 May 2020 19:08:01 +0200 (CEST) Received: (qmail 29998 invoked by uid 500); 4 May 2020 17:08:01 -0000 Mailing-List: contact issues-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nifi.apache.org Delivered-To: mailing list issues@nifi.apache.org Received: (qmail 29988 invoked by uid 99); 4 May 2020 17:08:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 May 2020 17:08:01 +0000 Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 997F3E2569 for ; Mon, 4 May 2020 17:08:00 +0000 (UTC) Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id 18EC0780309 for ; Mon, 4 May 2020 17:08:00 +0000 (UTC) Date: Mon, 4 May 2020 17:08:00 +0000 (UTC) From: "Marc Parisi (Jira)" To: issues@nifi.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MINIFICPP-1201) Integrates MiNiFi C++ with H2O Driverless AI MOJO Scoring Pipeline (C++ Runtime Python Wrapper) To Do ML Inference on Edge MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MINIFICPP-1201?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 17099139#comment-17099139 ]=20 Marc Parisi commented on MINIFICPP-1201: ---------------------------------------- +1 Yeah totally. thanks for following up. I think in the e-mail chain it wa= s becoming apparent that I wasn't correctly relaying that I think H2O-3 nee= ded to be introduced as a corrective action. > Integrates MiNiFi C++ with H2O Driverless AI MOJO Scoring Pipeline (C++ R= untime Python Wrapper) To Do ML Inference on Edge > -------------------------------------------------------------------------= ------------------------------------------------- > > Key: MINIFICPP-1201 > URL: https://issues.apache.org/jira/browse/MINIFICPP-1201 > Project: Apache NiFi MiNiFi C++ > Issue Type: New Feature > Affects Versions: master > Environment: Ubuntu 18.04 in AWS EC2 > MiNiFi C++ 0.7.0 > Reporter: James Medel > Priority: Blocker > Fix For: 0.8.0 > > > *MiNiFi C++ and H2O Driverless AI Integration*=C2=A0via Custom Python Pro= cessors: > Integrates MiNiFi C++ with H2O Driverless AI by using Driverless AI's MOJ= O Scoring Pipeline (in C++ Runtime Python Wrapper) and MiNiFi's Custom Pyth= on Processor. Uses a Python Processor to execute the MOJO Scoring Pipeline = to do batch scoring or real-time scoring for one or more predicted labels o= n tabular test data in the incoming flow file content. If the tabular data = is one row, then the MOJO does real-time scoring. If the tabular data is mu= ltiple rows, then the MOJO does batch scoring. I would like to contribute m= y processors to MiNiFi C++ as a new feature. > *1 custom python processor*=C2=A0created for MiNiFi: > *H2oMojoPwScoring* - Executes H2O Driverless AI's MOJO Scoring Pipeline i= n C++ Runtime Python Wrapper to do batch scoring or real-time scoring on a = frame of data within each incoming flow file. Requires the user to add the = *pipeline.mojo* filepath into the "MOJO Pipeline Filepath" property. This p= roperty is used in the onTrigger(context, session) function to get the pipe= line.mojo filepath, so we can *pass it into* the *daimojo.model(pipeline_mo= jo_filepath)* function to instantiate our *mojo_scorer*. MOJO creation time= and uuid are added as individual flow file attributes. Then the *flow file= content* is *loaded into Datatable* *frame* to hold the test data. Then a = Python lambda function called compare is used to compare whether the=C2=A0d= atatable frame header column names equals the expected header column names = from the mojo scorer. This check is done because the datatable frame could = have a missing header, which is true when the header does not equal the exp= ected header and so we update the datatable frame header with the mojo scor= er's expected header. Having the correct header works nicely because the *m= ojo scorer's* *predict(datatable_frame)* function needs the header and then= does the prediction returning a predictions datatable frame. The mojo scor= er's predict function is *capable of doing real-time scoring or batch scori= ng*, it just depends on the amount of rows that the tabular data has. This = predictions datatable frame is then converted to pandas dataframe, so we ca= n use pandas' to_string(index=3DFalse) function to convert the dataframe to= a string without the dataframe's index. Then *the prediction string is wri= tten to flow file content*. A flow file attribute is added for the number o= f rows scored. Another one or more flow file attributes are added for the p= redicted label name and its associated score.=C2=A0Finally, the flow file i= s transferred on a success relationship. > =C2=A0 > *Hydraulic System Condition Monitoring*=C2=A0Data used in MiNiFi Flow: > The sensor test data I used in this integration comes from=C2=A0Kaggle: C= ondition Monitoring of Hydraulic Systems. I was able to predict hydraulic s= ystem cooling efficiency through MiNiFi and H2O integration described above= . This use case here is hydraulic system predictive maintenance. -- This message was sent by Atlassian Jira (v8.3.4#803005)