mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Schlaikjer <andrew.schlaik...@gmail.com>
Subject Re: question on VectorWritable convertor in elephant-bird.
Date Tue, 15 May 2012 15:29:23 GMT
Looking at my setup, I register Mahout jars for mahout-collections,
mahout-math, and mahout-core when using VectorWritableConverter, so the set
of register statements might look something like this:

{{{

REGISTER 'hdfs:///path/to/jars/com.twitter-elephant-bird-*.jar';
REGISTER 'hdfs:///path/to/jars/org.apache.mahout-mahout-collections-*.jar';
REGISTER 'hdfs:///path/to/jars/org.apache.mahout-mahout-math-*.jar';
REGISTER 'hdfs:///path/to/jars/org.apache.mahout-mahout-core-*.jar';

}}}


On Tue, May 15, 2012 at 8:15 AM, Andy Schlaikjer <
andrew.schlaikjer@gmail.com> wrote:

> Yohan, Sounds like you're almost there--
>
> You need to register both EB and Mahout jars so that when
> SequenceFileLoader class-loads VectorWritableConverter, the Mahout
> VectorWritable and Vector classes (and all of their dependencies) are also
> available.
>
> Andy
>
>
> On Tue, May 15, 2012 at 7:59 AM, Yohan Chin <yohan.jin@gmail.com> wrote:
>
>> Andy,
>> thanks for your response.
>>
>> I've tried it again with your suggestion.
>> still error (as below). seems like, need to solve "mahout class"
>> dependency which used in VectorWritableConverter.
>>
>> When I set-up elephant-bird, followed  "
>> https://github.com/kevinweil/elephant-bird" and completed quick-start
>> and protocol-buffer, thrift 0.5 dependencies.
>> so got  path/to/build/elephant-bird-2.2.3-SNAPSHOT.jar
>>
>> in the pig code, register path/to/build/elephant-bird-2.2.3-SNAPSHOT.jar
>>
>> Should I set-up for mahout-class dependencies separately?
>>
>> Thanks!
>>
>>
>> error message)
>>
>> Unexpected internal error. could not instantiate
>> 'com.twitter.elephantbird.pig.load.SequenceFileLoader' with arguments '[-c
>> com.twitter.elephantbird.pig.util.IntWritableConverter, -c
>> com.twitter.elephantbird.pig.mahout.VectorWritableConverter -- -sparse]'
>>
>>
>> Caused by: java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
>>        at java.lang.Class.forName0(Native Method)
>>        at java.lang.Class.forName(Class.java:247)
>>        at
>> org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426)
>>        at
>> com.twitter.elephantbird.pig.load.SequenceFileLoader.getWritableConverter(SequenceFileLoader.java:233)
>>        at
>> com.twitter.elephantbird.pig.load.SequenceFileLoader.<init>(SequenceFileLoader.java:152)
>>        at
>> com.twitter.elephantbird.pig.load.SequenceFileLoader.<init>(SequenceFileLoader.java:175)
>>        ... 21 more
>> Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>
>>
>> On May 15, 2012, at 7:01 AM, Andy Schlaikjer wrote:
>>
>> > Yohan, that's a typo in VectorWritableConverter javadoc. I'll update
>> today.
>> >
>> > The SequenceFileStorage and ...Loader classes are in separate packages:
>> >
>> > com.twitter.elephantbird.pig.*load*.SequenceFileLoader<
>> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java
>> >
>> > com.twitter.elephantbird.pig.*store*.SequenceFileStorage<
>> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java
>> >
>> >
>> > Both of these classes rely on the
>> > WritableConverter<
>> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/util/WritableConverter.java
>> >interface.
>> > They classload converters at runtime, given the classname of the
>> > converters you'd like to use for key and value Writable instances. When
>> > dealing with SequenceFile<IntWritable, VectorWritable> data, do this:
>> >
>> > {{{
>> >
>> > %declare SEQFILE_LOADER
>> > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
>> > %declare INT_CONVERTER
>> > 'com.twitter.elephantbird.pig.util.IntWritableConverter';
>> > %declare VECTOR_CONVERTER
>> > 'com.twitter.elephantbird.pig.mahout.VectorWritableConverter';
>> >
>> > pair = LOAD '$INPUT_PATH' USING $SEQFILE_LOADER (
>> >  '-c $INT_CONVERTER',
>> >  '-c $VECTOR_CONVERTER -- -sparse'
>> > );
>> >
>> > }}}
>> >
>> > Hope this helps!
>> >
>> > Andy
>> >
>> >
>> > On Mon, May 14, 2012 at 11:57 PM, Ted Dunning <ted.dunning@gmail.com>
>> wrote:
>> >> Sounds like a class path issue.
>> >>
>> >> Sent from my iPhone
>> >>
>> >> On May 15, 2012, at 2:43 AM, Yohan Chin <yohan.jin@gmail.com> wrote:
>> >>
>> >>>
>> >>> Hi,
>> >>> Recently, I've tried to utilize elephant-bird for loading mahout
>> result
>> > into pig.
>> >>> I could install elephant-bird and got .jar file.
>> >>> and followed instructions as appears in below; (written by Andy
>> > Schlaikjer)
>> >>>
>> >
>> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/mahout/VectorWritableConverter.java
>> >>> ex)
>> >>> pair = LOAD '$data' USING
>> > com.twitter.elephantbird.pig.store.SequenceFileLoader (
>> >>> '-c $INT_CONVERTER',
>> >>> '-c $VECTOR_CONVERTER -- -dense -cardinality 2'
>> >>> );
>> >>> however,  there is no sequenceFileLoader in store folder,  and
>> > load/sequencefileloader.java doesn't import
>> > "com.twitter.elephantbird.pig.mahout.VectorWritableConverter"
>> >>>
>> >>> Is there any points I've missed?
>> >>>
>> >>> Thanks a lot for this awesome api!
>> >>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message