mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mihai Josan <Mihai.Jo...@iquestgroup.com>
Subject RE: RE: how to use a custom distance measure with kmeans?
Date Tue, 19 Feb 2013 09:08:54 GMT
Hello,

I managed to resolve the problem without modifying the Mahout script.
I inserted my classes into the mahout job jar  (mahout-examples-0.7-cdh4.1.2-job.jar) and
everything is ok now.

Thank you very much for your help,
Mihai Josan

-----Original Message-----
From: Mihai Josan [mailto:Mihai.Josan@iquestgroup.com] 
Sent: Thursday, February 14, 2013 6:05 PM
To: user@mahout.apache.org
Subject: [ Non iQuest : could be Junk ] RE: how to use a custom distance measure with kmeans?

I modified line 251 like this: export HADOOP_CLASSPATH=$MAHOUT_CONF_DIR:${HADOOP_CLASSPATH}:$CLASSPATH
Now I don't have the Class not found exception but I get:  Error: java.lang.ClassNotFoundException:
org.apache.mahout.math.Vector

I found a big discussion regarding this error at http://mail-archives.apache.org/mod_mbox/mahout-user/201105.mbox/browser
called The perennial "Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector"
problem At this moment I am still looking for solutions to this problem. 

-----Original Message-----
From: Dan Filimon [mailto:dangeorge.filimon@gmail.com]
Sent: Thursday, February 14, 2013 12:25 PM
To: user@mahout.apache.org
Subject: Re: how to use a custom distance measure with kmeans?

I can think of only 2 possibilities:

- in the script, I think it goes through the if statements to line 251 where the HADOOP_CLASSPATH
is being set; that line differs from line
243 where the CLASSPATH you set also gets added. So, it seems that the CLASSPATH you set isn't
being passed to hadoop. From the look of your exec this seems to be the case.
To fix it, add :$CLASSPATH at the end of line 251.

- the class you're referring isn't fully specified (but this looks unlikely).

I'm not sure why the mahout script has those cases starting at line
239 (especially since they are undocumented).

Let us know if it works!

On Wed, Feb 13, 2013 at 6:18 PM, Mihai Josan <Mihai.Josan@iquestgroup.com> wrote:
> Hello,
>
> After I made the changes, I still get the Class not found exception. I created my project
using maven and eclipse and the jar is generated from eclipse export jar. Do you have any
other idea how to resolve this problem?
>
> mahout kmeans -i /user/rhadoop/mahout/abac-out/sequence  \
>        -c  /user/rhadoop/mahout/abac-out/canopy-centroids/clusters-0 \
>        -o  /user/rhadoop/mahout/abac-out/clusters-out/ \
>        -x 10 \
>        -dm clustering.AbacDistanceMeasure \
>        -ow
>
> **how the $CLASSPATH looks after line 120:
> CLASSPATH   /usr/lib/sqoop/postgresql-9.2-1002.jdbc4.jar:/etc/mahout/conf.dist:/usr/lib/mahout/lib/abacDistance.jar
>
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and 
> HADOOP_CONF_DIR=/etc/hadoop/conf
> MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.1.2-job.jar
> HADOOP_CLASSPATH   /etc/mahout/conf.dist:/usr/lib/mahout/mahout-examples-*-job.jar:/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.1.2.jar
>
> ** the exec command:
> CMD: /usr/lib/hadoop/bin/hadoop jar
> /usr/lib/mahout/mahout-examples-0.7-cdh4.1.2-job.jar
> org.apache.mahout.driver.MahoutDriver kmeans -i 
> /user/rhadoop/mahout/abac-out/sequence -c
> /user/rhadoop/mahout/abac-out/canopy-centroids/clusters-0 -o 
> /user/rhadoop/mahout/abac-out/clusters-out/ -x 10 -dm 
> clustering.AbacDistanceMeasure -ow
>
> 13/02/13 17:59:08 INFO common.AbstractJob: Command line arguments: 
> {--clusters=[/user/rhadoop/mahout/abac-out/canopy-centroids/clusters-0], --convergenceDelta=[0.5],
--distanceMeasure=[clustering.AbacDistanceMeasure], --endPhase=[2147483647], --input=[/user/rhadoop/mahout/abac-out/sequence],
--maxIter=[10], --method=[mapreduce], --output=[/user/rhadoop/mahout/abac-out/clusters-out/],
--overwrite=null, --startPhase=[0], --tempDir=[temp]} Exception in thread "main" java.lang.IllegalStateException:
java.lang.ClassNotFoundException: clustering.AbacDistanceMeasure
>         at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:30)
>         at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:92)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:49)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>         at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
>         at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Caused by: java.lang.ClassNotFoundException: clustering.AbacDistanceMeasure
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:169)
>         at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:28)
>         ... 15 more
>
> Thank you,
> Mihai Josan
>
> -----Original Message-----
> From: Dan Filimon [mailto:dangeorge.filimon@gmail.com]
> Sent: Wednesday, February 13, 2013 1:25 PM
> To: user@mahout.apache.org
> Subject: Re: how to use a custom distance measure with kmeans?
>
> Sure, that sounds like an ever better solution!
> I didn't read the entire script. :)
>
> On Wed, Feb 13, 2013 at 6:40 AM, Mahesh Balija <balijamahesh.mca@gmail.com> wrote:
>> Hi Dan,
>>
>>               If we copy the jar containing the custom classes to the 
>> MAHOUT_HOME/lib folder wont that work fine?
>>               Because at line 147 of mahout script it reads all jars 
>> under lib folder and put into classpath.
>>
>>               If this won't work prolly there should be some better 
>> way to add the custom classes to classpath rather than users 
>> modifying the script file.
>>
>> Thanks,
>> Mahesh Balija,
>> Calsoft Labs.
>>
>> On Tue, Feb 12, 2013 at 10:18 PM, Dan Filimon
>> <dangeorge.filimon@gmail.com>wrote:
>>
>>> You need to add the JAR containing the distance measure you want to 
>>> the classpath.
>>> By default the CLASSPATH is set in line 120 of the mahout script.
>>> (the script itself is in the bin/ folder of your Mahout installation).
>>>
>>> Sadly I don't think that scripts allows you to set the class path by 
>>> default, but it should be a simple add.
>>> You can either:
>>> a. add the path to your JAR/class folder manually at line 120 b. 
>>> (the cleaner way) add a new variable called something like 
>>> MAHOUT_EXTRA_CLASSPATH to line 120 which you can set to whatever you 
>>> need.
>>>
>>> b. is a bit cleaner, but you need to modify the script anyway.
>>>
>>> Alternatively, if you dislike fudging with the script you can have a 
>>> closer look at it and see that running 'mahout classpath' gives you 
>>> the classpath it builds. Then you can run the hadoop script directly 
>>> like in line 252 of the script and edit the HADOOP_CLASSPATH (see 
>>> http://stackoverflow.com/questions/3799679/how-to-run-a-hadoop-program).
>>>
>>> This should really be better documented. Sorry you're having trouble!
>>>
>>> Good luck! :)
>>>
>>> On Tue, Feb 12, 2013 at 6:30 PM, Mihai Josan 
>>> <Mihai.Josan@iquestgroup.com> wrote:
>>> > This is the error I receive:
>>> >
>>> > mahout kmeans -i /user/rhadoop/in/sequence/ \
>>> >>        -c  /user/rhadoop/out/canopy-centroids/clusters-0 \
>>> >>        -o  /user/rhadoop/out/clusters-out/ \
>>> >>        -x 10 \
>>> >>        -dm
>>> /home/rhadoop/projects/workspace/mahout_abac/target/classes/clusteri
>>> n
>>> g/AbacDistanceMeasure.class
>>> >
>>> > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>>> > Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
>>> HADOOP_CONF_DIR=/etc/hadoop/conf
>>> > MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.1.2-job.jar
>>> > 13/02/12 17:05:57 INFO common.AbstractJob: Command line arguments:
>>> {--clusters=[/user/rhadoop/out/canopy-centroids/clusters-0],
>>> --convergenceDelta=[0.5],
>>> --distanceMeasure=[/home/rhadoop/projects/workspace/mahout_abac/targ
>>> e t/classes/clustering/AbacDistanceMeasure.class],
>>> --endPhase=[2147483647], --input=[/user/rhadoop/in/sequence/],
>>> --maxIter=[10], --method=[mapreduce], 
>>> --output=[/user/rhadoop/out/clusters-out2/], --startPhase=[0], 
>>> --tempDir=[temp]}
>>> > Exception in thread "main" java.lang.IllegalStateException:
>>> java.lang.ClassNotFoundException:
>>> /home/rhadoop/projects/workspace/mahout_abac/target/classes/clusteri
>>> n
>>> g/AbacDistanceMeasure.class
>>> >         at
>>> org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:30
>>> )
>>> >         at
>>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.ja
>>> v
>>> a:92)
>>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> >         at
>>> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.j
>>> a
>>> va:49)
>>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> >         at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
>>> java:39)
>>> >         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcce
>>> s
>>> sorImpl.java:25)
>>> >         at java.lang.reflect.Method.invoke(Method.java:597)
>>> >         at
>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(Progr
>>> a
>>> mDriver.java:72)
>>> >         at
>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
>>> >         at
>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> >         at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
>>> java:39)
>>> >         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcce
>>> s
>>> sorImpl.java:25)
>>> >         at java.lang.reflect.Method.invoke(Method.java:597)
>>> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>>> > Caused by: java.lang.ClassNotFoundException:
>>> /home/rhadoop/projects/besmart/workspace/mahout_abac/target/classes/
>>> c
>>> lustering/AbacDistanceMeasure.class
>>> >         at java.lang.Class.forName0(Native Method)
>>> >         at java.lang.Class.forName(Class.java:169)
>>> >         at
>>> org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:28
>>> )
>>> >         ... 15 more
>>> >
>>> >
>>> > Is this the proper way to use the custom distance measure? or 
>>> > should I
>>> package the class? and how?
>>> >
>>> > Thank you in advance,
>>> > Mihai Josan
>>> >
>>> >> Are you getting any errors?
>>> >> Can you specify fully qualified class name of your distance 
>>> >> measure
>>> (like
>>> >> com.xxx.MyDistanceMeasure) and check?
>>> >>
>>> >> Best,
>>> >> Mahesh Balija,
>>> >> Calsoft Labs.
>>> >>
>>> >>
>>> >> On Tue, Feb 12, 2013 at 2:28 PM, Mihai Josan <
>>> Mihai.Josan@iquestgroup.com>wrote:
>>> >>
>>> >> > Hello,
>>> >> >
>>> >> > Can you please tell me how can I use a custom made distance 
>>> >> > measure
>>> with
>>> >> > Mahout in command line?
>>> >> > I am trying to do a clusterizationusing this distance like:
>>> >> >
>>> >> > mahout kmeans -i in/sequence/ \
>>> >> >        -c  out/centroids/clusters-0 \
>>> >> >        -o  out/clusters-out/ \
>>> >> >        -x 10 \
>>> >> >        -dm MyDistanceMeasure \
>>> >> >        -ow
>>> >> >
>>> >> > Thank you in advance,
>>> >> > Mihai
>>> >> >
>>>
Mime
View raw message