hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harish Butani (JIRA)" <>
Subject [jira] [Created] (HIVE-7940) Expose Machine Learning functions and Model application in Hive
Date Tue, 02 Sep 2014 19:57:20 GMT
Harish Butani created HIVE-7940:

             Summary: Expose Machine Learning functions and Model application in Hive
                 Key: HIVE-7940
             Project: Hive
          Issue Type: New Feature
            Reporter: Harish Butani

*Machine Learning functions*
# [HiveMall|] has demonstrated how to do machine learning
in Hive. It has an extensive set of  functions; it shows a way through UDTFs and Amplify technique
to do iterative computations. There is a lot of interest in the Hive User community to use
# Other possible ways to expose machine learning functionality:
#* via Script Operator(Or Table Functions) that call out to a Machine Learning service like
[Oxdata|]. In this scheme the service's nodes would communicate
outside of hive, process the data in multiple iterations and then return the result back into
the hive pipeline.
#* At the language level, provide an iteration mechanism in Hive: this has more general applications:
to express Recursive CTEs and also to express Graph Algorithms.

*Model Application*
Even when  Regression/Classification models are build in other tools we should provide a way
to evaluate these models against the entire dataset residing in Hive. These can be exposed
as UDFs in Hive. A possible route could be a generic PMML based module, for e.g. [JPMML-Hive|].
Or we should provide integration for specific libraries: Spark MLLib, R and Python (SciPy/NumPy)
seem the most popular toolkits.

The *goal* would be to provide Machine Learning functionality as a Feature of Hive like [MadLib|]
on Postgres, Pivotal, Impala etc.
Capturing this high level requirement in this jira.

This message was sent by Atlassian JIRA

View raw message