mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Multi-file Matrices?
Date Mon, 14 Nov 2011 07:28:08 GMT
So, a DRM is a set of one or more files, where each SequenceFile int/vector
pair is a row number and a fully wide vector? Then ordering is in the
IntWritable keys.

On Sun, Nov 13, 2011 at 10:56 PM, Jake Mannix <jake.mannix@gmail.com> wrote:

> I don't think we currently make any guarantees about sort-order of the
> parts
> themselves, or among the various part-files, as the may be created by any
> number of map-reduce jobs, and are then consumed by map-reduce jobs
> which have no inter-process communication.
>
> What would ordering even *mean* among map-inputs?  Or are you just
> referring to in each chunk itself?  Or for non-MR use of the files?
>
>  -jake
>
> On Sun, Nov 13, 2011 at 10:38 PM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
>
> > Make sure that the files can be ordered, of course.  Losing the ordering
> > can be really bad.
> >
> > On Sun, Nov 13, 2011 at 10:34 PM, Jake Mannix <jake.mannix@gmail.com>
> > wrote:
> >
> > > Yeah, in particular, DistributedRowMatrix "is" simply a
> > > SequenceFile<IntWritable,VectorWritable>, when in its serialized form.
> >  As
> > > such,
> > > this "file" can be (and typically is) a series of part-* files in a
> > > directory (typically
> > > on HDFS).
> > >
> > >  -jake
> > >
> > > On Sun, Nov 13, 2011 at 10:23 PM, Dmitriy Lyubimov <dlieu.7@gmail.com
> > > >wrote:
> > >
> > > > It's my understanding drm can be multifile. In fact, stuff like
> > > seq2sparse
> > > > will produce multifile output, being a MR job itself.
> > > > On Nov 12, 2011 3:23 PM, "Lance Norskog" <goksron@gmail.com> wrote:
> > > >
> > > > > Is there a convention for multi-file matrices? For example, the
> > > > > DistributedRowMatrix?
> > > > >
> > > > > --
> > > > > Lance Norskog
> > > > > goksron@gmail.com
> > > > >
> > > >
> > >
> >
>



-- 
Lance Norskog
goksron@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message