spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arseniy Tashoyan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-23269) FP-growth: Provide last transaction for each detected frequent pattern
Date Tue, 30 Jan 2018 09:52:00 GMT
Arseniy Tashoyan created SPARK-23269:
----------------------------------------

             Summary: FP-growth: Provide last transaction for each detected frequent pattern
                 Key: SPARK-23269
                 URL: https://issues.apache.org/jira/browse/SPARK-23269
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 2.2.1
            Reporter: Arseniy Tashoyan


FP-growth implementation gives patterns and their frequences:

_model.freqItemsets_:
||items||freq||
|[5]|3|
|[5, 1]|3|

It would be great to know when each pattern occurred last time - what it the last transaction
having this pattern.

To do so, it will be necessary to tell FPGrowth what is the timestamp column in the transactions
data frame:
{code:java}
val fpgrowth = new FPGrowth()
  .setItemsCol("items")
  .setTimestampCol("timestamp")
{code}
So the data frame with patterns could look like:
||items||freq||lastOccurrence||
|[5]|3|2018-01-01 12:15:00|
|[5, 1]|3|2018-01-01 12:15:00|

Without this functionality, it is necessary to traverse the transactions data frame with the
set of detected patterns and determine the last transaction for each pattern. Why traverse
transactions once again if it has been already done in FP-growth execution?

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message