mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Sequential access to VectorWritable content proposal.
Date Mon, 13 Dec 2010 19:06:45 GMT
I also think it is not very common, but it would be a collateral benefit

On Mon, Dec 13, 2010 at 11:05 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> How common is it that a row won't fit in memory?  My experience is that
> essentially all rows that
> I am interested will fit in very modest amounts of memory, but that row by
> row handling is imperative.
>
> Is this just gilding the lily?
>
> On Mon, Dec 13, 2010 at 10:24 AM, Jake Mannix <jake.mannix@gmail.com>
> wrote:
>
> > Hey Dmitriy,
> >
> >  I've also been playing around with a VectorWritable format which is
> backed
> > by a
> > SequenceFile, but I've been focussed on the case where it's essentially
> the
> > entire
> > matrix, and the rows don't fit into memory.  This seems different than
> your
> > current
> > use case, however - you just want (relatively) small vectors to load
> > faster,
> > right?
> >
> >  -jake
> >
> > On Mon, Dec 13, 2010 at 10:18 AM, Ted Dunning <ted.dunning@gmail.com>
> > wrote:
> >
> > > Interesting idea.
> > >
> > > Would this introduce a new vector type that only allows iterating
> through
> > > the elements once?
> > >
> > > On Mon, Dec 13, 2010 at 9:49 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I would like to submit a patch to VectorWritable that allows for
> > > streaming
> > > > access to vector elements without having to prebuffer all of them
> > first.
> > > > (current code allows for the latter only).
> > > >
> > > > That patch would allow to strike down one of the memory usage issues
> in
> > > > current Stochastic SVD implementation and effectively open memory
> bound
> > > for
> > > > n of the SVD work. (The value i see is not to open up the the bound
> > > though
> > > > but just be more efficient in memory use, thus essentially speeding u
> p
> > > the
> > > > computation. )
> > > >
> > > > If it's ok, i would like to create a JIRA issue and provide a patch
> for
> > > it.
> > > >
> > > > Another issue is to provide an SSVD patch that depends on that patch
> > for
> > > > VectorWritable.
> > > >
> > > > Thank you.
> > > > -Dmitriy
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message