From issues-return-96727-archive-asf-public=cust-asf.ponee.io@nifi.apache.org Mon May 4 14:49:03 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4B486180608 for ; Mon, 4 May 2020 16:49:03 +0200 (CEST) Received: (qmail 42117 invoked by uid 500); 4 May 2020 14:49:02 -0000 Mailing-List: contact issues-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nifi.apache.org Delivered-To: mailing list issues@nifi.apache.org Received: (qmail 42064 invoked by uid 99); 4 May 2020 14:49:02 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 May 2020 14:49:02 +0000 Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 87F36E2F8D for ; Mon, 4 May 2020 14:49:01 +0000 (UTC) Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id 5059A7808FB for ; Mon, 4 May 2020 14:49:00 +0000 (UTC) Date: Mon, 4 May 2020 14:49:00 +0000 (UTC) From: "Joe Witt (Jira)" To: issues@nifi.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MINIFICPP-1199) Integrates MiNiFi C++ with H2O Driverless AI Python Scoring Pipeline To Do ML Inference on Edge MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MINIFICPP-1199?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 17099003#comment-17099003 ]=20 Joe Witt commented on MINIFICPP-1199: ------------------------------------- tagging as a blocker since the commit has been merged and yet licensing rem= ains a concern > Integrates MiNiFi C++ with H2O Driverless AI Python Scoring Pipeline To D= o ML Inference on Edge > -------------------------------------------------------------------------= ---------------------- > > Key: MINIFICPP-1199 > URL: https://issues.apache.org/jira/browse/MINIFICPP-1199 > Project: Apache NiFi MiNiFi C++ > Issue Type: New Feature > Affects Versions: master > Environment: Ubuntu 18.04 in AWS EC2 > MiNiFi C++ 0.7.0 > Reporter: James Medel > Priority: Blocker > Fix For: master > > > *MiNiFi C++ and H2O Driverless AI Integration* via Custom Python Processo= rs: > Integrates MiNiFi C++ with H2O's Driverless AI by Using Driverless AI's P= ython Scoring Pipeline and MiNiFi's Custom Python Processors. Uses the Pyth= on Processors to execute the Python Scoring Pipeline scorer to do batch sco= ring and real-time scoring for one or more predicted labels on test data in= the incoming flow file content. I would like to contribute my processors t= o MiNiFi C++ as a new feature. > =C2=A0 > *3 custom python processors*=C2=A0created for MiNiFi: > *H2oPspScoreRealTime* - Executes H2O Driverless AI's Python Scoring Pipel= ine to do interactive scoring (real-time) scoring on an individual row or l= ist of test data within each incoming flow file. Uses H2O's open-source Dat= atable library to load test data into a frame, then converts it to pandas d= ataframe. Pandas is used to convert the pandas dataframe rows to a list of = lists, but since each flow file passing through this processor should have= =C2=A0only 1 row, we extract the 1st list. Then that list is passed into th= e Driverless AI's Python scorer.score() function to predict one or more pre= dicted labels. The prediction is returned to a list. The number of predicte= d labels is specified when the user built the Python Scoring Pipeline in Dr= iverless AI. With that knowledge, there is a property for the user to pass = in one or more predicted label names that will be used as the predicted hea= der. I create a comma separated string using the predicted header and predi= cted value. The predicted header(s) is on one line followed by a newline an= d the predicted value(s) is on the next line followed by a newline. The str= ing is written to the flow file content. Flow File attributes are added to = the flow file for the number of lists scored and the predicted label name a= nd its associated score. Finally, the flow file is transferred on a success= relationship. > =C2=A0 > *H2oPspScoreBatches* - Executes H2O Driverless AI's Python Scoring Pipeli= ne to do batch scoring on a frame of data within each incoming flow file. U= ses H2O's open-source Datatable library to load test data into a frame. Eac= h frame from the flow file passing through this processor should have multi= ple rows.=C2=A0That frame is passed into the Driverless AI's Python scorer.= score_batch() function to predict one or more predicted labels. The predict= ion is returned to a pandas dataframe, then that dataframe is converted to = a string, so it can be written to the flow file content. Flow File attribut= es are added to the flow file for the number of rows scored. There are also= flow file attributes added for the predicted label name and its associated= score for the first row in the frame.=C2=A0Finally, the flow file is trans= ferred on a success relationship. > =C2=A0 > *ConvertDsToCsv* - Converts data source of incoming flow file to csv. Use= s H2O's open-source Datatable library to load data into a frame,=C2=A0then = converts it to pandas dataframe. Pandas is used to convert the pandas dataf= rame to a csv and store it into in-memory text stream StringIO without pand= as dataframe index. The csv string data is grabbed using file read() functi= on on the StringIO object, so it can be written to the flow file content. T= he flow file is transferred on a success relationship. > =C2=A0 > *Hydraulic System Condition Monitoring* Data used in MiNiFi Flow: > The sensor test data I used in this integration comes from [Kaggle: Condi= tion Monitoring of Hydraulic Systems|[https://www.kaggle.com/jjacostupa/con= dition-monitoring-of-hydraulic-systems#description.txt]]. I was able to pre= dict hydraulic system cooling efficiency through MiNiFi and H2O integration= described above. This use case here is hydraulic system predictive mainten= ance. > =C2=A0 -- This message was sent by Atlassian Jira (v8.3.4#803005)