mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anand Avati (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1529) Finalize abstraction of distributed logical plans from backend operations
Date Tue, 20 May 2014 03:26:38 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002743#comment-14002743
] 

Anand Avati commented on MAHOUT-1529:
-------------------------------------

[~dlyubimov], I had a quick look at the commits, and it looks a lot cleaner separation now.
Some comments:

- Should DrmLike really be a generic class like DrmLike[T] where T is unbounded? For e.g,
it does not make sense to have DrmLike[String]. The only meaningful ones probably are DrmLike[Int]
and DrmLike[Double]. Is there someway we can restrict DrmLike to just Int and Double? Or fixate
on just Double? While RDD supports arbitrary T, H2O supports only numeric types which is sufficient
for Mahout's needs.

- I am toying around with the new separation, to build a pure/from scratch local/in-memory
"backend" which communicates through a ByteArrayStream Java serialization. I am hoping this
will not only serve as a reference for future backend implementors, but also help to keep
test cases of the algorithms inside math-scala. Thoughts?

> Finalize abstraction of distributed logical plans from backend operations
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-1529
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1529
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Dmitriy Lyubimov
>             Fix For: 1.0
>
>
> We have a few situations when algorithm-facing API has Spark dependencies creeping in.

> In particular, we know of the following cases:
> -(1) checkpoint() accepts Spark constant StorageLevel directly;-
> (2) certain things in CheckpointedDRM;
> (3) drmParallelize etc. routines in the "drm" and "sparkbindings" package. 
> (5) drmBroadcast returns a Spark-specific Broadcast object
> *Current tracker:* https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-1529.
> *Pull requests are welcome*.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message