mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <>
Subject Re: [jira] Commented: (MAHOUT-510) Standardize serialization mechanisms
Date Mon, 17 Jan 2011 17:36:58 GMT
Ah, ok, now I see what you are talking about. This is a bit of laziness 
on my part that I forgot about. The ModelDistribution is produced from 
3-4 argument values (modelFactory, modelPrototype, distanceMeasure, 
prototypeSize) from the command line. You could just pass those argument 
values (all strings) and rebuild the ModelDistribution from those when 

On 1/17/11 10:23 AM, Sean Owen wrote:
> The idea was to remove all use of JSON in an attempt to reduce the number of
> different serialization approaches used. So at the moment I'm trying to
> figure out what happens when I delete everything related to JSON. Most of it
> goes quietly.
> The only use that seems, well, actively used is the bit in DirichletDriver
> where...
>      conf.set(MODEL_DISTRIBUTION_KEY, modelDistribution.asJsonString());
> ... the ModelDistribution is serialized to a String and stuffed in the job's
> Configuration object. The DirichletMapper.getDirichletState() method then
> deserializes. In this way the model distribution is passed to workers via
> Configuration.
> As Ted says it seems like a minor abuse of "Configuration" but entirely
> practical. Nothing's really wrong there other than the idea that perhaps
> it'd be more uniform to pass this on the file system. Maybe at some point it
> gets too big anyway to handle this way.
> That's the only outstanding question for MAHOUT-510 at the moment as far as
> I am concerned.
> On Mon, Jan 17, 2011 at 5:17 PM, Jeff Eastman<>wrote:
>> Dirichlet uses Writable to serialize its iteration output state (to
>> clusters-n). I'm confused about what your trying to do.
>> On 1/17/11 9:58 AM, Ted Dunning wrote:
>>> This sort of thing is what the distributed cache was designed for.
>>> On Mon, Jan 17, 2011 at 8:53 AM, Sean Owen<>   wrote:
>>>   Do you think the way forward is to leave it, or use Writable and write
>>>> the
>>>> model distribution to a file, or something else?

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message