mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: DistributedRowMatrix - FileNotFoundException
Date Thu, 08 Mar 2012 05:33:16 GMT
In examples/bin/asf-email-examples.sh it shows how the Bayes
classifier takes raw text and creates int/vector sequence files.  You
can get a very small subset of the Apache mail archives.

Try running this example and watch the different files as the script
makes them. The Mahout job is seq2sparse.

On Wed, Mar 7, 2012 at 8:20 PM, Paritosh Ranjan <pranjan@xebia.com> wrote:
> You will have to use org.apache.hadoop.io.SequenceFile.Writer to write a
> sequence file which can be used as a input.
>
> Something like,
>
> Writer writer = new Writer(fileSystem, conf, pathToWrite, IntWritable.class,
> VectorWritable.class);
> //for all IntWritable, VectorWritable pairs
> writer.append(new IntWritable(theIntValue), new VectorWritable(theVector));
>
> and then use this sequence file.
>
>
> On 08-03-2012 02:57, Sean Owen wrote:
>>
>> DistributedRowMatrix operates on IntWritable,VectorWritable in a
>> sequence file, and it looks like you're feeding text. No, it doesn't
>> accept some text-based format.
>>
>> On Wed, Mar 7, 2012 at 8:41 PM, PEDRO MANUEL JIMENEZ RODRIGUEZ
>> <pmjimenez1983@hotmail.com>  wrote:
>>>
>>> Sorry but I can't understand how to do it.
>>>
>>> I have single separated-space text file with my input matrix. To run
>>> DistributedRowMatrix with that file I need to convert data to seqFile
>>> format.
>>>
>>> How I can do this with  SequenceFileInputFormat? I have tried with
>>> InputDriver but I didn't have success.
>>>
>>> Thanks for your help.
>>>
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message