spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Caique Rodrigues Marques (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-8855) Python API for Association Rules
Date Thu, 03 Dec 2015 04:07:11 GMT

    [ https://issues.apache.org/jira/browse/SPARK-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037211#comment-15037211
] 

Caique Rodrigues Marques commented on SPARK-8855:
-------------------------------------------------

I am working on this, but I found a doubt.

Following the description of the issue, it says that a important method is "FPGrowthModel.generateAssociationRules()",
of course. However, is not clear if a wrapper for the association rules it will be in "FPGrowthModelWrapper.scala"
and this is the problem.

My idea is the following:
1) In the fpm.py file; class "Association Rules" with one method and a class:
1.1) Method train(data, minConfidence), that will generate the association rules for a data
with a minConfidence specified (0.6 default). This method will call the "trainAssociationRules"
from the PythonMLLibAPI with the parameters data and minConfidence. Later. will return a FPGrowthModel.
1.2) Class Rule, that will a namedtuple, represents an (antecedent, consequent) tuple.

2) Still in fpm.py, in the class FPGrowthModel, a new method will be added, called generateAssociationRules,
that will map the Rules generated calling the method "getAssociationRule" from FPGrowthModelWrapper
to the namedtuple.

Now is my doubt, how to make trainAssociationRules returns a FGrowthModel to the Wrapper just
maps the rule received to the antecedent/consequent? I could not do the method trainAssociationRules
returns a FPGrowthModel. The wrapper for association rules is in FPGrowthModelWrapper, right?
Something wrong with the idea?

For illustration, I think something like this in PythonMLLibAPI and in FPGrowthModelWrapper,
respectively:
{code:none}
//  PythonMLLibAPI.scala
def trainAssociationRules(
      data: JavaRDD[FPGrowth.FreqItemset[Any]],
      minConfidence: Double): [return type] = {

    val model = new FPGrowthModel(data.rdd)
      .generateAssociationRules(minConfidence)

    new FPGrowthModelWrapper(model) // will fail
  }
-----------------------------------------------------------------------
//  FPGrowthModelWrapper.scala
def getAssociationRules: [return type] = {
    SerDe.fromTuple2RDD(rule.map(x => (x.javaAntecedent, x.javaConsequent)))
 }

{code}

Any suggestions?

> Python API for Association Rules
> --------------------------------
>
>                 Key: SPARK-8855
>                 URL: https://issues.apache.org/jira/browse/SPARK-8855
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Feynman Liang
>            Priority: Minor
>
> A simple Python wrapper and doctests needs to be written for Association Rules. The relevant
method is {{FPGrowthModel.generateAssociationRules}}. The code will likely live in {{fpm.py}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message