mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Schlaikjer <andrew.schlaik...@gmail.com>
Subject Re: question on VectorWritable convertor in elephant-bird.
Date Tue, 15 May 2012 14:01:31 GMT
Yohan, that's a typo in VectorWritableConverter javadoc. I'll update today.

The SequenceFileStorage and ...Loader classes are in separate packages:

com.twitter.elephantbird.pig.*load*.SequenceFileLoader<https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java>
com.twitter.elephantbird.pig.*store*.SequenceFileStorage<https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java>

Both of these classes rely on the
WritableConverter<https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/util/WritableConverter.java>interface.
They classload converters at runtime, given the classname of the
converters you'd like to use for key and value Writable instances. When
dealing with SequenceFile<IntWritable, VectorWritable> data, do this:

{{{

%declare SEQFILE_LOADER
'com.twitter.elephantbird.pig.load.SequenceFileLoader';
%declare INT_CONVERTER
'com.twitter.elephantbird.pig.util.IntWritableConverter';
%declare VECTOR_CONVERTER
'com.twitter.elephantbird.pig.mahout.VectorWritableConverter';

pair = LOAD '$INPUT_PATH' USING $SEQFILE_LOADER (
  '-c $INT_CONVERTER',
  '-c $VECTOR_CONVERTER -- -sparse'
);

}}}

Hope this helps!

Andy


On Mon, May 14, 2012 at 11:57 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> Sounds like a class path issue.
>
> Sent from my iPhone
>
> On May 15, 2012, at 2:43 AM, Yohan Chin <yohan.jin@gmail.com> wrote:
>
>>
>> Hi,
>> Recently, I've tried to utilize elephant-bird for loading mahout result
into pig.
>> I could install elephant-bird and got .jar file.
>> and followed instructions as appears in below; (written by Andy
Schlaikjer)
>>
https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/mahout/VectorWritableConverter.java
>> ex)
>> pair = LOAD '$data' USING
com.twitter.elephantbird.pig.store.SequenceFileLoader (
>> '-c $INT_CONVERTER',
>> '-c $VECTOR_CONVERTER -- -dense -cardinality 2'
>> );
>> however,  there is no sequenceFileLoader in store folder,  and
load/sequencefileloader.java doesn't import
"com.twitter.elephantbird.pig.mahout.VectorWritableConverter"
>>
>> Is there any points I've missed?
>>
>> Thanks a lot for this awesome api!
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message