spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From feynmanliang <>
Subject [GitHub] spark pull request: [SPARK-9898][MLlib] Prefix Span user guide
Date Tue, 18 Aug 2015 00:39:24 GMT
Github user feynmanliang commented on a diff in the pull request:
    --- Diff: docs/ ---
    @@ -96,3 +96,99 @@ for (FPGrowth.FreqItemset<String> itemset: model.freqItemsets().toJavaRDD().coll
    +## PrefixSpan
    +PrefixSpan is a sequential pattern mining algorithm described in
    +[Pei et al., Mining Sequential Patterns by Pattern-Growth: The
    +PrefixSpan Approach]( We refer
    +the reader to the referenced paper for formalizing the sequential
    +pattern mining problem.
    +MLlib's PrefixSpan implementation takes the following parameters:
    +* `minSupport`: the minimum support required to be considered a frequent
    +  sequential pattern.
    +* `maxPatternLength`: the maximum length of a frequent sequential
    +  pattern. Any frequent pattern exceeding this length will not be
    +  included in the results.
    +* `maxLocalProjDBSize`: the maximum number of items allowed in a
    +  prefix-projected database before local iterative processing of the
    +  projected databse begins. This parameter should be tuned with respect
    +  to the size of your executors.
    +The following example illustrates PrefixSpan running on the sequences
    +(using same notation as Pei et al):
    +  <(12)3>
    --- End diff --

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message