systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shirish Tatikonda <shirish.tatiko...@gmail.com>
Subject Re: Matrix Market format with metadata file
Date Tue, 16 Feb 2016 00:58:04 GMT
Ok. Cool.

On Mon, Feb 15, 2016 at 4:57 PM, Deron Eriksson <deroneriksson@gmail.com>
wrote:

> Very good eye! I used "m = matrix("1 2 3 0 0 0 7 8 9 0 0 0", rows=4,
> cols=3)" to generate the mm file, so the 4th row did indeed contain all
> zeros.
>
>
> On Mon, Feb 15, 2016 at 4:50 PM, Shirish Tatikonda <
> shirish.tatikonda@gmail.com> wrote:
>
> > Btw (Just to be precise), in your example of "mm" file.. the metadata is
> "4
> > 3 6" but the following non-zero values are only up to row number 3. So,
> > either it was a typo or the 4th row contains all zeros.
> >
> >
> >
> > On Mon, Feb 15, 2016 at 4:26 PM, Shirish Tatikonda <
> > shirish.tatikonda@gmail.com> wrote:
> >
> > > Both "mm" and "text" formats are identical except for a couple of
> > > differences:
> > >
> > > 1) for "mm": the matrix metadata is included in the first two lines;
> and
> > > for "text": the metadata is present in the associated .mtd file
> > > 2) "mm" data must be in a single file (i.e., no *part* files) where
> > > "text" data can span multiple *part* files (like any other file on
> HDFS).
> > >
> > > The support for "mm" is created mainly for the purpose of
> > > importing/exporting data in the format that R likes.
> > >
> > > Shirish
> > >
> > > On Mon, Feb 15, 2016 at 4:17 PM, Deron Eriksson <
> deroneriksson@gmail.com
> > >
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> I have a question with regards to text vs mm. Isn't the mm coordinate
> > >> format identical to the text format but the mm data file happens to
> > >> include
> > >> the metadata line for rows, cols, and nnzs, so shouldn't they scale
> the
> > >> same since the text row values (i,j,v) correspond to the mm rows?
> > >>
> > >> If we have the following MM:
> > >> %%MatrixMarket matrix coordinate real general
> > >> 4 3 6
> > >> 1 1 1.0
> > >> 1 2 2.0
> > >> 1 3 3.0
> > >> 3 1 7.0
> > >> 3 2 8.0
> > >> 3 3 9.0
> > >>
> > >> The corresponding text format (with accompanying metadata file) is:
> > >> 1 1 1.0
> > >> 1 2 2.0
> > >> 1 3 3.0
> > >> 3 1 7.0
> > >> 3 2 8.0
> > >> 3 3 9.0
> > >>
> > >> So aren't these formats essentially the same?
> > >>
> > >> Deron
> > >>
> > >>
> > >> On Mon, Feb 15, 2016 at 3:56 PM, Matthias Boehm <mboehm@us.ibm.com>
> > >> wrote:
> > >>
> > >> > The meta data file is still useful in order to get the format. In
> case
> > >> of
> > >> > matrix market, errors will be raised if included meta data is
> > >> inconsistent.
> > >> > So no, we should not disallow to specify the meta data. In general,
> we
> > >> > anyway recommend using text (textcell) instead mm (matrix market)
> for
> > >> > scalability reasons.
> > >> >
> > >> > Regards,
> > >> > Matthias
> > >> >
> > >> > [image: Inactive hide details for Deron Eriksson ---02/15/2016
> > 03:45:46
> > >> > PM---Hi, The Matrix Market coordinate format contains # rows,
> #]Deron
> > >> > Eriksson ---02/15/2016 03:45:46 PM---Hi, The Matrix Market
> coordinate
> > >> > format contains # rows, # columns, and #
> > >> >
> > >> > From: Deron Eriksson <deroneriksson@gmail.com>
> > >> > To: dev@systemml.incubator.apache.org
> > >> > Date: 02/15/2016 03:45 PM
> > >> > Subject: Matrix Market format with metadata file
> > >> > ------------------------------
> > >> >
> > >> >
> > >> >
> > >> > Hi,
> > >> >
> > >> > The Matrix Market coordinate format contains # rows, # columns, and
> #
> > >> > non-zero values as metadata near the top of a matrix data file.
> > >> >
> > >> > If I write a matrix in mm format using SystemML, no metadata file
is
> > >> > created since the metadata is stored within the data file.
> > >> >
> > >> > However, when reading a matrix with mm format, I can supply a
> metadata
> > >> > file, even though metadata exists in the matrix data file. Is there
> > any
> > >> > reason for this, or should this be disallowed since the metadata
> file
> > is
> > >> > redundant and can cause confusion, since metadata values can then
be
> > >> > specified in two places, which then brings up the question, "which
> > >> metadata
> > >> > value should be used"?
> > >> >
> > >> > Deron
> > >> >
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message