mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Sequence file format for Kmeans, LDA, etc.
Date Fri, 13 Nov 2009 21:15:55 GMT
This talk combined with previous talk about preferred mode of composing
tools (script writing using java) is beginning to make me think that we need
something like a HdfsMatrix and LocalFileMatrix which are simply wrappers
around file names, but which allow extraction of elements (for debugging and
diagnostics and sequential implementations) or for passing to generic driver
routines or receiving from generic conversion routines.

Should I open a JIRA?

On Fri, Nov 13, 2009 at 11:54 AM, Grant Ingersoll <>wrote:

> Also, take a look at what the TfIdfDriver does for the classifier stuff.
>  This is a M/R job for converting text for it's format.  I think we can
> abstract that to be more general purpose and then move it under the Utils
> module.  The only thing that likely needs to change is whether we output the
> Writable for the classifier or whether we output a Vector.  That is my naive
> view at this point.

Ted Dunning, CTO

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message