mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: [jira] Commented: (MAHOUT-510) Standardize serialization mechanisms
Date Mon, 17 Jan 2011 17:36:58 GMT
Ah, ok, now I see what you are talking about. This is a bit of laziness 
on my part that I forgot about. The ModelDistribution is produced from 
3-4 argument values (modelFactory, modelPrototype, distanceMeasure, 
prototypeSize) from the command line. You could just pass those argument 
values (all strings) and rebuild the ModelDistribution from those when 
needed.


On 1/17/11 10:23 AM, Sean Owen wrote:
> The idea was to remove all use of JSON in an attempt to reduce the number of
> different serialization approaches used. So at the moment I'm trying to
> figure out what happens when I delete everything related to JSON. Most of it
> goes quietly.
>
> The only use that seems, well, actively used is the bit in DirichletDriver
> where...
>
>      conf.set(MODEL_DISTRIBUTION_KEY, modelDistribution.asJsonString());
>
> ... the ModelDistribution is serialized to a String and stuffed in the job's
> Configuration object. The DirichletMapper.getDirichletState() method then
> deserializes. In this way the model distribution is passed to workers via
> Configuration.
>
> As Ted says it seems like a minor abuse of "Configuration" but entirely
> practical. Nothing's really wrong there other than the idea that perhaps
> it'd be more uniform to pass this on the file system. Maybe at some point it
> gets too big anyway to handle this way.
>
> That's the only outstanding question for MAHOUT-510 at the moment as far as
> I am concerned.
>
>
> On Mon, Jan 17, 2011 at 5:17 PM, Jeff Eastman<jdog@windwardsolutions.com>wrote:
>
>> Dirichlet uses Writable to serialize its iteration output state (to
>> clusters-n). I'm confused about what your trying to do.
>>
>>
>>
>> On 1/17/11 9:58 AM, Ted Dunning wrote:
>>
>>> This sort of thing is what the distributed cache was designed for.
>>>
>>> On Mon, Jan 17, 2011 at 8:53 AM, Sean Owen<srowen@gmail.com>   wrote:
>>>
>>>   Do you think the way forward is to leave it, or use Writable and write
>>>> the
>>>> model distribution to a file, or something else?
>>>>
>>>>


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message