nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject MiNiFi commits for h2o
Date Mon, 04 May 2020 14:50:00 GMT
Team

The below two commits raise serious concerns.

I want to be clear here and point out that h2o is cool.  Having such
integration is a neat concept and idea and one that certainly warrants
determination on how best to do so.

My issue is with these two commits as it relates to licensing and
maintenance.

License:
We should support vendors like this wanting to bring their capabilities
into NiFi generally sure. But the licensing and mode of use is critical.
In talking about this with the contributor for NiFi as well it is clear
that at least some or important portions of this require the user to have a
'driverless ai license' so they can include their jar or build pipelines or
whatnot.  Thus it isn't usable without that first. So it might be the case
that this stuff is source dependent only on ASF compatible licenses in
terms of source - but it certainly doesn't seem to be true in terms of
binary dependencies. Where in the PR or JIRA is there any discussion or
review of the licensing?  I see plenty from James to believe this needs to
be reverted.  Any contrary info?
https://github.com/apache/nifi/pull/4242#issuecomment-622654527


Maintenance:
We should find a way for vendors to offer extension points like this
without having to take on the burden of maintenance. How can we possibly do
this well?  We're learning this lesson the hard way in NiFi itself and this
is why the registry is being formulated.

JIRA/hygiene:
https://issues.apache.org/jira/browse/MINIFICPP-1199
and
https://issues.apache.org/jira/browse/MINIFICPP-1201
Both are open.  Yet we've merged commits that claim to be against each.

I believe both of these commits need to be reverted as I do not believe the
licensing considerations have been addressed.  I'd like to see discussion
on the above maintenance concern as well but that is less pressing.


Thanks
Joe


commit 7206c62240647520cf35649868d5d87903a256c2
Author: James Medel <jamesmedel94@gmail.com>
Date:   Wed Apr 29 12:38:04 2020 -0700

    MINIFI-1201: Integrate H2O Driverless AI MSP in MiNFi (#766)

    MINIFI-1201: Integrate H2O Driverless AI MSP in MiNFi (#766)

commit 6e5f96518764df7791519c0ee625a94a207ddc69
Author: James Medel <jamesmedel94@gmail.com>
Date:   Wed Apr 29 12:37:00 2020 -0700

    MINIFI-1199: Integrate H2O Driverless AI PSP in MiNiFi (#763)

    * MINIFI-1199: Integrate H2O Driverless AI in MiNiFi

    MiNiFi C++ and H2O Driverless AI Integration via Custom Python
Processors:
    Integrates MiNiFi C++ with H2O's Driverless AI by Using Driverless AI's
Python Scoring Pipeline and MiNiFi's Custom Python Processors. Uses the
Python Processors to execute the Python Scoring Pipeline scorer to do batch
scoring and real-time scoring for one or more predicted labels on test data
in the incoming flow file content. I would like to contribute my processors
to MiNiFi C++ as a new feature.

    * Update H2oPspScoreBatches processor

    This update includes passing "index=False" to
"batch_scores_df.to_string(index=False)". By updating this line of code, we
tell pandas to not include the DataFrame index when converting the
DataFrame to a string. The reason for this update is that we don't want
this extra column pandas tries to add to our predictions frame, we only
want the predictions. Thus, later when the predictions get saved to a csv
file, it will only include the predictions.

    * Moved ConvertDsToCsv to h2o base dir

    Since this ConvertDsToCsv python processor is used by the
    Python Scoring Pipeline and MOJO Scoring Pipeline processors,
    I moved ConvertDsToCsv to h2o base dir for easier access to it.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message