spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tashoyan <>
Subject [GitHub] spark pull request #20578: [SPARK-23318][ML] FP-growth: WARN FPGrowth: Input...
Date Sun, 11 Feb 2018 15:28:29 GMT
GitHub user tashoyan opened a pull request:

    [SPARK-23318][ML] FP-growth: WARN FPGrowth: Input data is not cached

    ## What changes were proposed in this pull request?
    Cache the RDD of items in ml.FPGrowth before passing it to mllib.FPGrowth. Cache only
when the user did not cache the input dataset of transactions. This fixes the warning about
uncached data emerging from mllib.FPGrowth.
    ## How was this patch tested?
    1. Run ml.FPGrowthExample - warning is there
    2. Apply the fix
    3. Run ml.FPGrowthExample again - no warning anymore

You can merge this pull request into a Git repository by running:

    $ git pull SPARK-23318

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20578
commit d17d3fbee84fcb0072d3030f3118ca18ce783e0c
Author: Arseniy Tashoyan <tashoyan@...>
Date:   2018-02-10T21:16:51Z

    [SPARK-23318][ML]Workaround for 'ArrayStoreException: [Ljava.lang.Object' when trying
to cache the RDD of items.

commit e0eb8519bf09db12f5d5bc426eaf17d6488e05c1
Author: Arseniy Tashoyan <tashoyan@...>
Date:   2018-02-11T15:21:39Z

    [SPARK-23318][ML] Cache the RDD of items if the user did not cache the input dataset of
transactions. This should eliminate the warning about uncahed data in mllib.FPGrowth.

commit 374a49c2bf447f3ddfed655f6eda9c8cd5f45285
Author: Arseniy Tashoyan <tashoyan@...>
Date:   2018-02-11T15:23:58Z

    Merge remote-tracking branch 'upstream/master' into SPARK-23318



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message