commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: [math] Fwd: [jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices
Date Tue, 21 Oct 2008 08:54:55 GMT
I am a little puzzled by this topic.

One the on hand, I tried many time to do such things and always failed. I know really think
persistence/serialization/transfer/interoperability is a complex task by itself that is completely
out of scope to very low level components. It already belongs to a middle level layer (did
I say middleware ?).

There are many different use cases for data storage/transfer from within an internal algorithm
representation to something more external or more long-lived. In some cases, basic data will
be enough and its meaning will already be known from both sides of communication so meta-data
will be cumbersome. In other cases meta-data are a great improvement (think matrices shapes
or non-null elements in sparse cases) but data can still be exchanged without them. In still
other cases meta-data are mandatory. Nobody will also agree on what meta-data should contain.

For these reasons, I tend to promote a separated approach: low level layers provide access
to basic information (both data and things that could be considered from outside as meta-data)
through their API (getEntry(i, j), getRowDimension(), isTriangular() ...) and a dedicated
project from middle level layer uses it for externalization. This project would already be
difficult enough.

On the other hand, if the matrix/vector case can be handled simply and if an almost general
representation can already be adopted for several use cases, then it could be interesting
to use it even in low level libraries. In this case, I think either XML or JSON would be nice.
I personaly prefer XML, but this really is not a point. Once again, in this case I would avoid
to bind too deeply data and meta-data. This would allow simple implementations to be done
and would be more easy to extend if we want. For example a dense matrix would have some structure
that is a simple big array of numbers, the columns labels being either above or below but
not mixed within the array.

I'm not sure this comment answers your question though.


----- Mail Original -----
De: "Ted Dunning" <>
À: "Commons Users List" <>
Envoyé: Mardi 21 Octobre 2008 07:34:42 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm
/ Vienne
Objet: [math] Fwd: [jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices

Luc and other commons math folk: 

Do you guys have opinions about serialization formats for matrices (both dense and sparse,
both with and without row, column and cell attributes)? 

---------- Forwarded message ---------- 
From: Jeff Eastman < > 
Date: Mon, Oct 20, 2008 at 10:03 PM 
Subject: Re: [jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices 

Ted Dunning wrote: 

I see what you mean. 

To repeat in other words, the problems that need to be solved are: 

a) there are many uses already so adding attributes should be transparent to 
those who don't use them 

b) the encoding should not be ad hoc because this would be our second ad hoc 
encoding and only one should ever be allowed before using a standard 


So here is a (kind of) concrete proposal: 

a) use JSON or Thrift for concrete syntax 

Any preferences here? This might also impact other Mahout packages in the future, so everybody
please weigh in. In general, it seems that having a common, public encoding for matrix and
vector data would help users mix and match the Mahout services. What are the requirements
of these other services? From inspection, it looks like only the clustering packages use them



To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message