flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Rohrmann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2211) Generalize ALS API
Date Wed, 17 Jun 2015 10:36:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589604#comment-14589604
] 

Till Rohrmann commented on FLINK-2211:
--------------------------------------

If you have more items than {{2^31-1}} then you clearly need {{Long}} IDs for them. However
every item block cannot contain more than {{2^31 - 1}} item vectors, because they are stored
in an array. However, by increasing the number of item blocks one can decrease the number
of items per block so that no block contains more items than {{2^31 - 1}}. But I think this
is a fair assumption since you usually are not able to keep an array of {{#itemsPerBlock *
#latentFactors * sizeOfDouble}} bytes with {{#itemsPerBlock >> 2^31 - 1}} in your memory
anyway. Furthermore, it's safe to assume that {{#latentFactors < 2^31 -1}} IMO. 

> Generalize ALS API
> ------------------
>
>                 Key: FLINK-2211
>                 URL: https://issues.apache.org/jira/browse/FLINK-2211
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>    Affects Versions: 0.9
>            Reporter: Ronny Bräunlich
>            Priority: Minor
>
> predict() and fit() require at the moment DataSet[(Int, Int)] or DataSet[(Int, Int, Double])
respectively.
> This should be changed to Long to accept more values or to something more general.
> See also http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Apache-Flink-0-9-ALS-API-td6424.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message