mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: cf/couccurence code
Date Wed, 09 Jul 2014 15:44:00 GMT
Hmm, that doesn’t seem like a good idea. Since there is precedence and for the sake of argument
I’ll go ahead and do it but:

1) it means the wrong module will fail a build test when the error in not in the test
2) it is a kind of lie about the dependencies of a module. A consumer would think they can
include only math-scala in a project but some ill-defined parts of it are useless without
spark. So no real separation can be made. I understand that this is so some hypothetical future
engine module can replace spark, but it would have to come with an awful lot of stuff including
many of the build tests for math-scala. This only adds to my concern over this approach and
will result in the real and current implementation on Spark to be misleading and confusing
in it’s structure.

But as I said for the sake of avoiding further argument I’ll separate impl from test.

On Jul 8, 2014, at 6:42 PM, Anand Avati <> wrote:

If that is the case, why not commit so much already (i.e, separate modules for code and test)
since that has been the "norm" thus far (see DSSVD, DSPCA etc.) Fixing code vs test modules
could be a separate task/activity (which I'm happy to pick up) on which cf code move need
not be dependent on.

On Tue, Jul 8, 2014 at 6:14 PM, Pat Ferrel <> wrote:
I already did the code and tests in separate modules, that works but is not a good way to
go imo. If there are tests that will work in math-scala then we can put the code in math-scala.
I couldn’t find a way to do it.

On Jul 8, 2014, at 4:40 PM, Anand Avati <> wrote:

I'm not completely sure how to address this (code and tests in separate
modules) as I write, but I will give it a shot soon.

On Mon, Jul 7, 2014 at 9:18 AM, Pat Ferrel <> wrote:

> OK, I’m spending more time on this than I have to spare. The test class
> extends MahoutLocalContext, which provides an implicit Spark context. I
> haven’t found a way to test parallel execution of cooccurrence without it.
> So far the only obvious option is to put cf into math-scala but the tests
> would have to remain in spark and that seems like trouble so I’d rather not
> do that.
> I suspect as more math-scala consuming algos get implemented this issue
> will proliferate. We will have implementations that do not require Spark
> but tests that do. We could create a new sub-project that allows for this I
> suppose but a new sub-project will require changes to SparkEngine and
> mahout’s script.
> If someone (Anand?) wants to offer a PR with some way around this I’d be
> happy to integrate.
> On Jun 30, 2014, at 5:39 PM, Pat Ferrel <> wrote:
> No argument, just trying to decide whether to create core-scala or keep
> dumping anything not Spark dependent in math-scala.
> On Jun 30, 2014, at 9:32 AM, Ted Dunning <> wrote:
> On Mon, Jun 30, 2014 at 8:36 AM, Pat Ferrel <> wrote:
>> Speaking for Sebastian and Dmitriy (with some ignorance) I think the idea
>> was to isolate things with Spark dependencies something like we did
> before
>> with Hadoop.
> Go ahead and speak for me as well here!
> I think isolating the dependencies is crucial for platform nimbleness
> (nimbility?)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message