mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: What about a universal input data handling mechanism for Mahout?
Date Sat, 06 Aug 2011 05:09:47 GMT
Sequencefiles are output by map-reduce all the time.

You just wind up with lots of files instead of one.

On Fri, Aug 5, 2011 at 10:02 PM, Xiaobo Gu <guxiaobo1982@gmail.com> wrote:

> It seems seqenceFiles can only be writen single threaded, map-reduce
> style programming can't be used, am I right?
>
> On Fri, Aug 5, 2011 at 10:51 PM, Xiaobo Gu <guxiaobo1982@gmail.com> wrote:
> > I will try to write a program named Csv2Seq, it will read all the csv
> > files under input recursively, and encode all the records as vectors,
> > and write all the encoded vectors into a sequenceFile of type
> > SequenceFile<Text, VectorWritable>, which can be consumed by
> > algorithms such as Naïve bayes. I have planed the following input
> > parameters for the program:
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message