commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rory Winston <rory.wins...@gmail.com>
Subject Re: [math] Fwd: [jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices
Date Tue, 21 Oct 2008 20:53:17 GMT
What are the requirements here? Obviously something like XML/JSON 
provide readability, numerous parser implementations across languages, 
and relatively easy extensibility, at the cost of increased verbosity 
and performance. Features like row/column labels would be 
straightforward - there are also numerous potential ways to represent 
sparse matrices, and attaching metadata to a matrix is no problem. But 
this may have a tendency to incur a certain amount of bloat.

Rory

luc.maisonobe@free.fr wrote:
> Well, in this simple situation, I agree it would be interesting to have
> an external representation format with row/column labels.
>
> I have only one suggestion: if a system (for example commons-math) does not
> provide the labels, there should be a default value generated in the text
> representation. An obvious candidate is the index (starting from 1)
> in string format. The idea is to keep things simple by avoiding optional parts.
>
> Luc
>
> ----- Mail Original -----
> De: "Ted Dunning" <ted.dunning@gmail.com>
> À: "Commons Users List" <user@commons.apache.org>
> Envoyé: Mardi 21 Octobre 2008 12:31:54 GMT +01:00 Amsterdam / Berlin / Berne / Rome
/ Stockholm / Vienne
> Objet: Re: [math] Fwd: [jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and
Matrices
>
> I should provide more context.
>
> The meta-data (attributes) that are envisioned here are typically going to
> be be row and column labels.  This is extremely helpful for applications
> such as recommendation engines.  It is entirely possible to hold this
> meta-data outside the matrices, but it is very useful to keep it inside so
> that, for example, label-aware matrix products can be implemented without
> having to externally intersect label tables and permute the matrices in
> order to make a normal product work correctly.
>
> Meta-data indicating things such as bandedness or sparsity are not part of
> this use case, but that is viable matrix level meta-data as well.
>
> On Tue, Oct 21, 2008 at 1:54 AM, <luc.maisonobe@free.fr> wrote:
>
>   
>> I am a little puzzled by this topic.
>>
>> One the on hand, I tried many time to do such things and always failed. I
>> know really think persistence/serialization/transfer/interoperability is a
>> complex task by itself that is completely out of scope to very low level
>> components. It already belongs to a middle level layer (did I say middleware
>> ?).
>>
>> There are many different use cases for data storage/transfer from within an
>> internal algorithm representation to something more external or more
>> long-lived. In some cases, basic data will be enough and its meaning will
>> already be known from both sides of communication so meta-data will be
>> cumbersome. In other cases meta-data are a great improvement (think matrices
>> shapes or non-null elements in sparse cases) but data can still be exchanged
>> without them. In still other cases meta-data are mandatory. Nobody will also
>> agree on what meta-data should contain.
>>
>> For these reasons, I tend to promote a separated approach: low level layers
>> provide access to basic information (both data and things that could be
>> considered from outside as meta-data) through their API (getEntry(i, j),
>> getRowDimension(), isTriangular() ...) and a dedicated project from middle
>> level layer uses it for externalization. This project would already be
>> difficult enough.
>>
>> On the other hand, if the matrix/vector case can be handled simply and if
>> an almost general representation can already be adopted for several use
>> cases, then it could be interesting to use it even in low level libraries.
>> In this case, I think either XML or JSON would be nice. I personaly prefer
>> XML, but this really is not a point. Once again, in this case I would avoid
>> to bind too deeply data and meta-data. This would allow simple
>> implementations to be done and would be more easy to extend if we want. For
>> example a dense matrix would have some structure that is a simple big array
>> of numbers, the columns labels being either above or below but not mixed
>> within the array.
>>
>> I'm not sure this comment answers your question though.
>>
>> Luc
>>
>> ----- Mail Original -----
>> De: "Ted Dunning" <ted.dunning@gmail.com>
>> À: "Commons Users List" <user@commons.apache.org>
>> Envoyé: Mardi 21 Octobre 2008 07:34:42 GMT +01:00 Amsterdam / Berlin /
>> Berne / Rome / Stockholm / Vienne
>> Objet: [math] Fwd: [jira] Commented: (MAHOUT-65) Add Element Labels to
>> Vectors and Matrices
>>
>>
>>
>>
>> Luc and other commons math folk:
>>
>> Do you guys have opinions about serialization formats for matrices (both
>> dense and sparse, both with and without row, column and cell attributes)?
>>
>>
>> ---------- Forwarded message ----------
>> From: Jeff Eastman < jdog@windwardsolutions.com >
>> Date: Mon, Oct 20, 2008 at 10:03 PM
>> Subject: Re: [jira] Commented: (MAHOUT-65) Add Element Labels to Vectors
>> and Matrices
>> To: mahout-dev@lucene.apache.org
>>
>>
>>
>> Ted Dunning wrote:
>>
>>
>> I see what you mean.
>>
>> To repeat in other words, the problems that need to be solved are:
>>
>> a) there are many uses already so adding attributes should be transparent
>> to
>> those who don't use them
>>
>> b) the encoding should not be ad hoc because this would be our second ad
>> hoc
>> encoding and only one should ever be allowed before using a standard
>>
>> +1
>>
>>
>>
>> So here is a (kind of) concrete proposal:
>>
>> a) use JSON or Thrift for concrete syntax
>>
>> Any preferences here? This might also impact other Mahout packages in the
>> future, so everybody please weigh in. In general, it seems that having a
>> common, public encoding for matrix and vector data would help users mix and
>> match the Mahout services. What are the requirements of these other
>> services? From inspection, it looks like only the clustering packages use
>> them currently.
>>
>> Jeff
>>
>>
>>
>> --
>> ted
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
>> For additional commands, e-mail: user-help@commons.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
>> For additional commands, e-mail: user-help@commons.apache.org
>>
>>
>>     
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message