mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Filimon <dangeorge.fili...@gmail.com>
Subject Re: seqdirectory command in MapReduce
Date Sat, 16 Feb 2013 18:34:38 GMT
But why would this be a problem? As long as it's using HDFS to access
the files, it should be able to fetch the chunks from wherever they
might be in the cluster.

I don't see why it wouldn't work. Let us know if it works!

On Sat, Feb 16, 2013 at 7:38 PM, Claudio Reggiani <nophiq@gmail.com> wrote:
> Yes, thank you Steve. And sorry for my encoded messages
>
> Claudio
>
>
> 2013/2/16 Steve Chien <stvchien@gmail.com>
>
>>  I think he meant that code is reading and converting the files from the
>> Input directory as a standalone program. Not a map-reduce program...
>>
>> On Feb 16, 2013, at 11:22, Dan Filimon <dangeorge.filimon@gmail.com>
>> wrote:
>>
>> > Hi Claudio,
>> >
>> > Could you be more specific? What does 'MapReduce style' mean?
>> > seqdirectory should create sequence files from the documents in a
>> > folder, where the keys are the document names and the values are the
>> > documents' content.
>> >
>> > What do you need it to do?
>> >
>> > On Sat, Feb 16, 2013 at 5:55 PM, Claudio Reggiani <nophiq@gmail.com>
>> wrote:
>> >> Hello,
>> >>
>> >> I have a text dataset. Running "seqdirectory" command on it I see it's
>> not
>> >> written in MapReduce style (looking at the source code of
>> >> SequenceFilesFromDirectory confirms that).
>> >>
>> >> What if I have a big dataset stored in HDFS and I would like to convert
>> it
>> >> in SequenceFile format? Do I need to create my own custom job or
>> >> seqdirectory does that?
>> >>
>> >> Thanks
>> >> Claudio Reggiani
>>

Mime
View raw message