mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Periya.Data" <periya.d...@gmail.com>
Subject Error (Re: Help in running from command line [WeightedPropertyVectorWritable])
Date Sun, 15 Jan 2012 05:13:35 GMT
As Sean suggested, I have resorted to having the
SimpleKMeansClustering.java file have the package
org.apache.mahout.clustering.kmeans and stored in the appropriate dir also.
Compiled the mahout-core sucessfully.

I am now running it like this:
~/CDH3/mahout/core/target/classes$ java -cp
$CLASSPATH:../mahout-core-0.6-SNAPSHOT.jar:../mahout-core-0.6-SNAPSHOT-job.jar:../../../math/target/mahout-math-0.6-SNAPSHOT.jar
org.apache.mahout.clustering.kmeans.SimpleKMeansClustering

Error:
12/01/14 21:02:05 INFO output.FileOutputCommitter: Saved output of task
'attempt_local_0004_m_000000_0' to output/clusteredPoints
12/01/14 21:02:05 INFO mapred.LocalJobRunner:
12/01/14 21:02:05 INFO mapred.Task: Task 'attempt_local_0004_m_000000_0'
done.
12/01/14 21:02:06 INFO mapred.JobClient:  map 100% reduce 0%
12/01/14 21:02:06 INFO mapred.JobClient: Job complete: job_local_0004
12/01/14 21:02:06 INFO mapred.JobClient: Counters: 6
12/01/14 21:02:06 INFO mapred.JobClient:   FileSystemCounters
12/01/14 21:02:06 INFO mapred.JobClient:     FILE_BYTES_READ=6533401
12/01/14 21:02:06 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=6779470
12/01/14 21:02:06 INFO mapred.JobClient:   Map-Reduce Framework
12/01/14 21:02:06 INFO mapred.JobClient:     Map input records=9
12/01/14 21:02:06 INFO mapred.JobClient:     Spilled Records=0
12/01/14 21:02:06 INFO mapred.JobClient:     SPLIT_RAW_BYTES=144
12/01/14 21:02:06 INFO mapred.JobClient:     Map output records=9
Exception in thread "main" java.io.IOException: wrong value class: 0.0:
null is not class
org.apache.mahout.clustering.WeightedPropertyVectorWritable
    at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1889)
    at
org.apache.mahout.clustering.kmeans.SimpleKMeansClustering.main(SimpleKMeansClustering.java:95)
pd@PeriyaData:~/CDH3/mahout/core/target/classes$

=============

I do not understand what this error means. I am digging into it. If anyone
has a clue, please let me know. The SimpleKmeans Clustering.java is 100%
copy and paste from MiA. I have not changed anything. Maybe I am missing
some jar files in my -cp?

Thanks again,
pd.





On Sat, Jan 14, 2012 at 7:54 PM, Periya.Data <periya.data@gmail.com> wrote:

> Thanks Lance.
>
> The reason(s) why I am asking this specific question :
> - I am new to Mahout
> - I am sort of new to Java itself. (core C/C++ programmer). Learning the
> basics of pom.xml.
> - I do not want to use Eclipse IDE now...though I have used it before. I
> really do not know what goes on "behind the scenes" if I use IDE. So, I
> want to do everything on command line and understand.
> - I have used the regular shell script for running mahout earlier ..like
> the following- $MAHOUT_HOME/bin/mahout kmeans       --input
> /input/mahout/vectorized/tfidf-vectors \
>                       --output
> $HDFS_OUTPUT_DIR/bigdata-canopy-centroids \
>                        --clusters
> $HDFS_OUTPUT_DIR/bigdata-canopy-centroids/clusters-0 \
> ....
>
> - The Mahout in Action book assumes that the reader can easily compile and
> run the programs. (which I am unable to). Please see this text in chapter 7
> of the book on KMeans.
>
> *"7.3.3  Analyzing the output*
> Compile and run the code in listing 7.2 using your favorite IDE or do it
> from the com-
> mand line. Make sure you add all the Mahout dependency JAR files to the
> classpath.
> Because our set of data is small, you’ll get the following output in a
> matter of seconds:"
>
>
> In other words, I really want to write a simple java clustering program
> (say using Kmeans), compile and run from command-line....just like any
> other normal java program. I am unable to do this simple stuff now. Any
> step-by-step instructions on this would give me a good start. At this
> stage, I need a little spoon-feeding.
>
> Appreciate your help,
> PD
>
>
>
> On Sat, Jan 14, 2012 at 6:23 PM, Lance Norskog <goksron@gmail.com> wrote:
>
>> Ah! A command-line invocation works from maven. mahout/bin/mahout is a
>> shell scripts which wraps up a bunch of handy things and runs java for
>> you. You can just say from the top level (if your class has a main):
>>
>> bin/mahout org.apache.mahout.package.class arg1 arg2 ... argN
>>
>> The problem with 'java -cp' is that the Maven repository downloader
>> parks every jar in a separate directory. 'mvn' has a wrapper that runs
>> java apps. Look at the mvn calls in this page:
>>
>>
>> http://www.lucidimagination.com/search/link?url=https://cwiki.apache.org/confluence/display/MAHOUT/RecommendationExamples
>>
>> On Sat, Jan 14, 2012 at 5:04 PM, Periya.Data <periya.data@gmail.com>
>> wrote:
>> > Thanks. About renaming packages -> I wanted to experiment with modified
>> > code at a later time and do not want to change the original. I am
>> > building/compiling from a different place.
>> >
>> > Also, as a newbie, I thought I would know exactly what is needed to run
>> > KMeans if I "gradually" build up my pom.xml ..rather than take what is
>> > already there which might have a lot of unnecessary  modules packaged
>> up.
>> >
>> > Finally, a sample command line for execution will be helpful. "java -cp
>> > ...". I shall try the universal job file as well.
>> >
>> > Thanks for your feedback,
>> > PD.
>> >
>> > On Sat, Jan 14, 2012 at 3:09 PM, Lance Norskog <goksron@gmail.com>
>> wrote:
>> >
>> >> I conflated two different things: 1) what you said, and 2) a newbie
>> >> will have a much easier time trying out the MIA code against the 0.5
>> >> release.
>> >>
>> >> On Sat, Jan 14, 2012 at 3:35 AM, Sean Owen <srowen@gmail.com> wrote:
>> >> > I don't think this has anything to do with using 0.5 vs 0.6 per se.
>> >> > All of this surgery is unnecessary. You simply need to use the .job
>> >> > files, which package all dependencies into one .jar, rather than
>> >> > individual jars.
>> >> >
>> >> > utils is now integration.
>> >> >
>> >> > You should not need to rename packages, not sure what you mean there.
>> >> >
>> >> > Sean
>> >> >
>> >> > On Sat, Jan 14, 2012 at 4:21 AM, Lance Norskog <goksron@gmail.com>
>> >> wrote:
>> >> >> The code for Mahout In Action is coded against the Mahout 0.5
>> release.
>> >> >> The trunk has changed a lot since then. You can change your pom.xml
>> >> >> dependencies to Mahout 0.5 and it should work better.
>> >> >>
>> >> >> You should start with this file, then add your changes.
>> >> >>
>> >> >>
>> >>
>> https://github.com/tdunning/MiA/blob/12a0a53757ba49142ab69f94c002ff21650cb3f0/MiA/pom.xml
>> >> >>
>> >> >> Lance
>> >> >>
>> >> >> On Thu, Jan 12, 2012 at 8:07 PM, Periya.Data <periya.data@gmail.com
>> >
>> >> wrote:
>> >> >>> Hi,
>> >> >>>    I am new to Mahout and began exploring the clustering examples.
>> I
>> >> >>> basically took the example code of SimpleKMeansClustering (from
>> Mahout
>> >> in
>> >> >>> Action) and trying to run it. The following is what I did :
>> >> >>>
>> >> >>> 1 - made sure I renamed the package name in the java file
>> >> appropriately.
>> >> >>> 2 - made sure hadoop is running (in pseudo-distributde mode).
>> >> >>> 3 - mvn clean install. My pom.xml file is pasted in the bottom
of
>> this
>> >> >>> email. The result is as follows:
>> >> >>>
>> >> >>> pd@PeriyaData:~/Mahout/clustering/target$ ls -l
>> >> >>> total 28
>> >> >>> drwxrwxr-x 3 pd pd 4096 2012-01-12 19:51 classes
>> >> >>> -rw-rw-r-- 1 pd pd 5173 2012-01-12 19:51
>> clustering-1.0-SNAPSHOT.jar
>> >> >>> drwxrwxr-x 4 pd pd 4096 2012-01-12 19:51 generated-sources
>> >> >>> drwxrwxr-x 2 pd pd 4096 2012-01-12 19:51 maven-archiver
>> >> >>> drwxrwxr-x 2 pd pd 4096 2012-01-12 19:51 surefire-reports
>> >> >>> drwxrwxr-x 3 pd pd 4096 2012-01-12 19:51 test-classes
>> >> >>> pd@PeriyaData:~/Mahout/clustering/target$
>> >> >>>
>> >> >>> 3 - Trying to run it by "java -classpath..." etc. Note...my
>> classpath
>> >> does
>> >> >>> not have mahout-utils.jar. It is missing in my build.
>> >> >>>
>> >> >>> pd@PeriyaData:~/Mahout/clustering/target/classes$ *java -cp
>> >> >>>
>> >>
>> ../clustering-1.0-SNAPSHOT.jar:~/CDH3/mahout/core/target/classes:~/CDH3/mahout/core/target/mahout-core-0.6-SNAPSHOT.jar:~/CDH3/mahout/math/target/mahout-math-0.6-SNAPSHOT.jar
>> >> >>> hw.mahout.kmeans.SimpleKMeansClustering *
>> >> >>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> >> >>> org/apache/mahout/common/distance/DistanceMeasure
>> >> >>> Caused by: java.lang.ClassNotFoundException:
>> >> >>> org.apache.mahout.common.distance.DistanceMeasure
>> >> >>>    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>> >> >>>    at java.security.AccessController.doPrivileged(Native Method)
>> >> >>>    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>> >> >>>    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>> >> >>>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>> >> >>>    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>> >> >>> Could not find the main class:
>> hw.mahout.kmeans.SimpleKMeansClustering.
>> >> >>> Program will exit.
>> >> >>> pd@PeriyaData:~/Mahout/clustering/target/classes$
>> >> >>>
>> >> >>> =============================
>> >> >>> questions:
>> >> >>>
>> >> >>>
>> >> >>>   1. I do not have mahout-utils.jar file ...for some strange
>> reason. I
>> >> am
>> >> >>>   using Mahout 0.6. I tried recompiling Mahout twice..using
MVN
>> clean
>> >> >>>   install. Still I do not see / cannot find mahout-utils-0.6.jar.
>> >> Perhaps
>> >> >>>   that is a problem. I have mahout-core, mahout-examples and
>> >> mahout-math.
>> >> >>>   2. Is the command syntax "java -cp ..." correct in step 3?
Please
>> >> advise.
>> >> >>>   3. Is my pom.xml is sufficient to for this build? Please
note
>> that in
>> >> >>>   pom.xml, I have mahout core and others as 0.5 version. For
some
>> >> strange
>> >> >>>   reason, if I have 0.6, maven build fails and complains that
4
>> >> artifacts are
>> >> >>>   missing - mahout-core, mahout-math, mahout-utils and
>> mahout-examples
>> >> jar
>> >> >>>   files. Is there a fix this?
>> >> >>>
>> >> >>>
>> >> >>> ==================
>> >> >>>
>> >> >>> pom.xml
>> >> >>>
>> >> >>> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="
>> >> >>> http://www.w3.org/2001/XMLSchema-instance"
>> >> >>>  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
>> >> >>> http://maven.apache.org/maven-v4_0_0.xsd">
>> >> >>>  <modelVersion>4.0.0</modelVersion>
>> >> >>>
>> >> >>>  <parent>
>> >> >>>    <artifactId>mahout</artifactId>
>> >> >>>    <groupId>org.apache.mahout</groupId>
>> >> >>>    <version>0.4</version>
>> >> >>>  </parent>
>> >> >>>
>> >> >>>
>> >> >>>  <groupId>hw.mahout.kmeans</groupId>
>> >> >>>  <artifactId>clustering</artifactId>
>> >> >>>  <packaging>jar</packaging>
>> >> >>>  <version>1.0-SNAPSHOT</version>
>> >> >>>  <name>clustering</name>
>> >> >>>  <url>http://maven.apache.org</url>
>> >> >>>
>> >> >>> <dependencies>
>> >> >>>    <dependency>
>> >> >>>      <groupId>org.apache.mahout</groupId>
>> >> >>>      <artifactId>mahout-core</artifactId>
>> >> >>>      <version>0.5</version>
>> >> >>>    </dependency>
>> >> >>>
>> >> >>>    <dependency>
>> >> >>>      <groupId>org.apache.mahout</groupId>
>> >> >>>      <artifactId>mahout-core</artifactId>
>> >> >>>      <type>test-jar</type>
>> >> >>>      <scope>test</scope>
>> >> >>>      <version>0.5</version>
>> >> >>>    </dependency>
>> >> >>>
>> >> >>>    <dependency>
>> >> >>>      <groupId>org.apache.mahout</groupId>
>> >> >>>      <artifactId>mahout-math</artifactId>
>> >> >>>      <version>0.5</version>
>> >> >>>    </dependency>
>> >> >>>
>> >> >>>    <dependency>
>> >> >>>      <groupId>org.apache.mahout</groupId>
>> >> >>>      <artifactId>mahout-math</artifactId>
>> >> >>>      <type>test-jar</type>
>> >> >>>      <scope>test</scope>
>> >> >>>      <version>0.5</version>
>> >> >>>    </dependency>
>> >> >>>
>> >> >>>    <dependency>
>> >> >>>      <groupId>org.apache.mahout</groupId>
>> >> >>>      <artifactId>mahout-utils</artifactId>
>> >> >>>      <version>0.5</version>
>> >> >>>    </dependency>
>> >> >>>
>> >> >>>    <dependency>
>> >> >>>      <groupId>org.apache.mahout</groupId>
>> >> >>>      <artifactId>mahout-examples</artifactId>
>> >> >>>      <version>0.5</version>
>> >> >>>    </dependency>
>> >> >>>
>> >> >>>    <dependency>
>> >> >>>      <groupId>com.google.guava</groupId>
>> >> >>>      <artifactId>guava</artifactId>
>> >> >>>      <version>r03</version>
>> >> >>>    </dependency>
>> >> >>>
>> >> >>>    <dependency>
>> >> >>>      <groupId>org.apache.thrift</groupId>
>> >> >>>      <artifactId>libthrift</artifactId>
>> >> >>>      <version>0.6.1</version>
>> >> >>>    </dependency>
>> >> >>>
>> >> >>>    <dependency>
>> >> >>>      <groupId>org.slf4j</groupId>
>> >> >>>      <artifactId>slf4j-log4j12</artifactId>
>> >> >>>      <version>1.5.11</version>
>> >> >>>    </dependency>
>> >> >>>
>> >> >>>    <dependency>
>> >> >>>      <groupId>org.apache.hadoop</groupId>
>> >> >>>      <artifactId>zookeeper</artifactId>
>> >> >>>      <version>3.3.1</version>
>> >> >>>    </dependency>
>> >> >>>
>> >> >>>    <dependency>
>> >> >>>      <groupId>org.twitter4j</groupId>
>> >> >>>      <artifactId>twitter4j-stream</artifactId>
>> >> >>>      <version>2.2.3</version>
>> >> >>>    </dependency>
>> >> >>>
>> >> >>>    <dependency>
>> >> >>>        <groupId>commons-io</groupId>
>> >> >>>        <artifactId>commons-io</artifactId>
>> >> >>>        <version>2.0.1</version>
>> >> >>>        <type>jar</type>
>> >> >>>        <scope>compile</scope>
>> >> >>>    </dependency>
>> >> >>>
>> >> >>>    <dependency>
>> >> >>>      <groupId>commons-logging</groupId>
>> >> >>>      <artifactId>commons-logging</artifactId>
>> >> >>>      <version>1.1.1</version>
>> >> >>>    </dependency>
>> >> >>>
>> >> >>>  </dependencies>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> <!--
>> >> >>>  <build>
>> >> >>>    <plugins>
>> >> >>>      <plugin>
>> >> >>>        <groupId>org.apache.maven.plugins</groupId>
>> >> >>>        <artifactId>maven-compiler-plugin</artifactId>
>> >> >>>        <version>2.3.2</version>
>> >> >>>        <configuration>
>> >> >>>          <encoding>UTF-8</encoding>
>> >> >>>          <source>1.6</source>
>> >> >>>          <target>1.6</target>
>> >> >>>          <optimize>true</optimize>
>> >> >>>        </configuration>
>> >> >>>      </plugin>
>> >> >>>      <plugin>
>> >> >>>        <groupId>org.apache.maven.plugins</groupId>
>> >> >>>        <artifactId>maven-antrun-plugin</artifactId>
>> >> >>>        <version>1.6</version>
>> >> >>>      </plugin>
>> >> >>>      <plugin>
>> >> >>>        <groupId>org.apache.maven.plugins</groupId>
>> >> >>>        <artifactId>maven-resources-plugin</artifactId>
>> >> >>>        <version>2.4.3</version>
>> >> >>>        <configuration>
>> >> >>>          <encoding>UTF-8</encoding>
>> >> >>>        </configuration>
>> >> >>>      </plugin>
>> >> >>>
>> >> >>>      <plugin>
>> >> >>>        <groupId>org.apache.maven.plugins</groupId>
>> >> >>>        <artifactId>maven-assembly-plugin</artifactId>
>> >> >>>        <executions>
>> >> >>>          <execution>
>> >> >>>            <id>job</id>
>> >> >>>            <phase>package</phase>
>> >> >>>            <goals>
>> >> >>>              <goal>single</goal>
>> >> >>>            </goals>
>> >> >>>            <configuration>
>> >> >>>              <descriptors>
>> >> >>>                <descriptor>src/main/assembly/job.xml</descriptor>
>> >> >>>              </descriptors>
>> >> >>>            </configuration>
>> >> >>>          </execution>
>> >> >>>          <execution>
>> >> >>>            <id>my-jar-with-dependencies</id>
>> >> >>>            <phase>package</phase>
>> >> >>>            <goals>
>> >> >>>              <goal>single</goal>
>> >> >>>            </goals>
>> >> >>>            <configuration>
>> >> >>>              <descriptorRefs>
>> >> >>>                <descriptorRef>jar-with-dependencies</descriptorRef>
>> >> >>>              </descriptorRefs>
>> >> >>>            </configuration>
>> >> >>>          </execution>
>> >> >>>        </executions>
>> >> >>>      </plugin>
>> >> >>>    </plugins>
>> >> >>>  </build>
>> >> >>>
>> >> >>> -->
>> >> >>>
>> >> >>> </project>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> Thanks very much,
>> >> >>>
>> >> >>> PD.
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Lance Norskog
>> >> >> goksron@gmail.com
>> >>
>> >>
>> >>
>> >> --
>> >> Lance Norskog
>> >> goksron@gmail.com
>> >>
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message