mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Hall <d...@cs.stanford.edu>
Subject Re: Introduction for student interested in GSoC
Date Thu, 26 Mar 2009 01:53:57 GMT
On Wed, Mar 25, 2009 at 6:41 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> Groovy closures are just objects as well, but they can't easily be
> serialized because they can capture references to other objects which are
> unlikely to exist on the far machine.

Same problem in Scala... But I just punt and assume people behave.
Strong assumption, but the one heavy user of SMR (me) has so far not
had much trouble doing that. :-)

>
> Can you say more about the compiler plugin?  Or provide a pointer?

All it does is make all anonymous closures implement
java.io.Serializable. The Scala compiler is unnecessarily picky by
default.

>
> Also, in your example here, how would you deal with the situation where a is
> incremented in map closure?  Just punt and say undefined?

Scala encourages immutability for a reason :-)

The "right" answer is almost certainly to define easy-to-use
constructs to communicate between the nodes when needed. I have
ThreadLocal[T], which avoids serialization.

Actually, since we're on this topic. The Wolfe, et al paper I cited at
the beginning draws out the concern that MapReduce actually isn't the
right paradigm for a lot of ML, and that you need to do clever things
like using junction tree topology to get better performance.

-- David

>
> On Wed, Mar 25, 2009 at 1:23 PM, David Hall <dlwh@cs.stanford.edu> wrote:
>
>> scala closures are just objects. With the compiler plugin I wrote it's
>> trivial to to serialize closures and send them down the wire. In fact,
>> that's how SMR works at the moment.
>>
>> int a = 3;
>>
>> for( (k,v) <- pairs) yield (v,k+ a)
>>
>> translates to
>>
>> pairs.map( new anonfun$obfuscationgarbage$1(a) )
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Mime
View raw message