flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gaborhermann <...@git.apache.org>
Subject [GitHub] flink issue #2542: [FLINK-4613] Extend ALS to handle implicit feedback datas...
Date Fri, 23 Sep 2016 14:46:28 GMT
Github user gaborhermann commented on the issue:

    We did not measure performance against Spark or other implementations yet. Those would
reflect the performance of Flink ALS implementation, as there is not much difference between
the implicit and explicit implementations.
    Instead, we compared the implicit case with the explicit case in the Flink implementation
on the same datasets, to make sure the implicit case does not decrease the performance significantly.
(Of course, we expected the implicit case to be slower due to the extra precomputation and
broadcasting of `Xt * X`.)
            expl  impl
    100     8885   9196
    1000    7879  11282
    10000   8839   9220
    100000  7102  10998
    1000000 7543  10680
    The numbers in the left column indicate the size of the training set (I'm not sure about
the measure, but @jfeher can tell about it). The numbers are the training time in milliseconds
in the explicit and implicit case respectively. We did the measurements on a small cluster
of 3 nodes.
    It seems, there is a large constant overhead, but it's not significantly slower in the
implicit case.
    We could do further, more thorough measurements if needed, but maybe that would be another
issue. Benchmarking more and optimizing both the original ALS algorithm and the specific `Xt
* X` computation in the implicit case could be a separate PR.
    What are your thoughts on this?

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message