mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <>
Subject [jira] Commented: (MAHOUT-510) Standardize serialization mechanisms
Date Sun, 16 Jan 2011 10:59:45 GMT


Sean Owen commented on MAHOUT-510:

(BTW I'm not committing this for some time.)

I've managed to take out almost all the usages. The only real usage of it is in the dirichlet
implementation, which uses it to serialize a ModelDistribution and pass it as a string to
Hadoop workers via the Configuration object.

Now, per the issue description, we could re-do serialization here to use Writable. That's
not hard and makes it possible to write these things out to HDFS later in a more Hadoop-ish
way later. But that gives you a serialization to bytes, not String. I could Base64-encode
it; it's not huge.

That's starting to get a little weird. Is the better answer to look at writing the ModelDistribution
to HDFS? or just leave this use of JSON?  

> Standardize serialization mechanisms
> ------------------------------------
>                 Key: MAHOUT-510
>                 URL:
>             Project: Mahout
>          Issue Type: Task
>    Affects Versions: 0.4
>            Reporter: Sean Owen
>             Fix For: 0.5
>         Attachments: MAHOUT-510.patch
> At the moment this is tracking a broader concern: to standardize as much as possible
how we approach serialization. The long-term goal is notionally to use the following "encodings"
as the input/output of Mahout stuff, and by extension, probably internally too.
> - Text
> - Vector Writable
> - (maybe Avro)
> not
> - Serializable

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message