commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning" <ted.dunn...@gmail.com>
Subject Re: [math] Fwd: [jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices
Date Tue, 21 Oct 2008 10:31:54 GMT
I should provide more context.

The meta-data (attributes) that are envisioned here are typically going to
be be row and column labels.  This is extremely helpful for applications
such as recommendation engines.  It is entirely possible to hold this
meta-data outside the matrices, but it is very useful to keep it inside so
that, for example, label-aware matrix products can be implemented without
having to externally intersect label tables and permute the matrices in
order to make a normal product work correctly.

Meta-data indicating things such as bandedness or sparsity are not part of
this use case, but that is viable matrix level meta-data as well.

On Tue, Oct 21, 2008 at 1:54 AM, <luc.maisonobe@free.fr> wrote:

> I am a little puzzled by this topic.
>
> One the on hand, I tried many time to do such things and always failed. I
> know really think persistence/serialization/transfer/interoperability is a
> complex task by itself that is completely out of scope to very low level
> components. It already belongs to a middle level layer (did I say middleware
> ?).
>
> There are many different use cases for data storage/transfer from within an
> internal algorithm representation to something more external or more
> long-lived. In some cases, basic data will be enough and its meaning will
> already be known from both sides of communication so meta-data will be
> cumbersome. In other cases meta-data are a great improvement (think matrices
> shapes or non-null elements in sparse cases) but data can still be exchanged
> without them. In still other cases meta-data are mandatory. Nobody will also
> agree on what meta-data should contain.
>
> For these reasons, I tend to promote a separated approach: low level layers
> provide access to basic information (both data and things that could be
> considered from outside as meta-data) through their API (getEntry(i, j),
> getRowDimension(), isTriangular() ...) and a dedicated project from middle
> level layer uses it for externalization. This project would already be
> difficult enough.
>
> On the other hand, if the matrix/vector case can be handled simply and if
> an almost general representation can already be adopted for several use
> cases, then it could be interesting to use it even in low level libraries.
> In this case, I think either XML or JSON would be nice. I personaly prefer
> XML, but this really is not a point. Once again, in this case I would avoid
> to bind too deeply data and meta-data. This would allow simple
> implementations to be done and would be more easy to extend if we want. For
> example a dense matrix would have some structure that is a simple big array
> of numbers, the columns labels being either above or below but not mixed
> within the array.
>
> I'm not sure this comment answers your question though.
>
> Luc
>
> ----- Mail Original -----
> De: "Ted Dunning" <ted.dunning@gmail.com>
> À: "Commons Users List" <user@commons.apache.org>
> Envoyé: Mardi 21 Octobre 2008 07:34:42 GMT +01:00 Amsterdam / Berlin /
> Berne / Rome / Stockholm / Vienne
> Objet: [math] Fwd: [jira] Commented: (MAHOUT-65) Add Element Labels to
> Vectors and Matrices
>
>
>
>
> Luc and other commons math folk:
>
> Do you guys have opinions about serialization formats for matrices (both
> dense and sparse, both with and without row, column and cell attributes)?
>
>
> ---------- Forwarded message ----------
> From: Jeff Eastman < jdog@windwardsolutions.com >
> Date: Mon, Oct 20, 2008 at 10:03 PM
> Subject: Re: [jira] Commented: (MAHOUT-65) Add Element Labels to Vectors
> and Matrices
> To: mahout-dev@lucene.apache.org
>
>
>
> Ted Dunning wrote:
>
>
> I see what you mean.
>
> To repeat in other words, the problems that need to be solved are:
>
> a) there are many uses already so adding attributes should be transparent
> to
> those who don't use them
>
> b) the encoding should not be ad hoc because this would be our second ad
> hoc
> encoding and only one should ever be allowed before using a standard
>
> +1
>
>
>
> So here is a (kind of) concrete proposal:
>
> a) use JSON or Thrift for concrete syntax
>
> Any preferences here? This might also impact other Mahout packages in the
> future, so everybody please weigh in. In general, it seems that having a
> common, public encoding for matrix and vector data would help users mix and
> match the Mahout services. What are the requirements of these other
> services? From inspection, it looks like only the clustering packages use
> them currently.
>
> Jeff
>
>
>
> --
> ted
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>


-- 
ted
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message