mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: LDA from Lucene Indexes
Date Wed, 04 May 2011 17:46:19 GMT
Pipelining is good for abstraction and really bad for performance (in the
map-reduce world).

My thought is that we could have a multipurpose tool.  Input would be a
lucene index and the program would read term vectors or original text as
available.  Output would be either sequence file full of text or sequence
file full of vectors.

This would allow pipelining if interesting, but would also allow the common
case of generating vectors to proceed in one step.

On Wed, May 4, 2011 at 10:41 AM, Jake Mannix <jake.mannix@gmail.com> wrote:

> On Wed, May 4, 2011 at 10:33 AM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
>
> > It might be that the right thing is to just tweak the current seq2saprse
> > process.
> >
> > Jake,
> >
> > is that what you were thinking?
> >
>
> Well seq2sparse is really for grabbing sequence files, and lucene.vector
> grabs
> lucene indexes... I was just imagining another script that takes lucene
> indexes
> and produces text files (or sequence files of text), so you can just
> pipeline it.
>
> I haven't thought about it too carefully, however.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message