mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Periya.Data" <periya.d...@gmail.com>
Subject Re: Help in running from command line
Date Sun, 15 Jan 2012 03:54:53 GMT
Thanks Lance.

The reason(s) why I am asking this specific question :
- I am new to Mahout
- I am sort of new to Java itself. (core C/C++ programmer). Learning the
basics of pom.xml.
- I do not want to use Eclipse IDE now...though I have used it before. I
really do not know what goes on "behind the scenes" if I use IDE. So, I
want to do everything on command line and understand.
- I have used the regular shell script for running mahout earlier ..like
the following- $MAHOUT_HOME/bin/mahout kmeans       --input
/input/mahout/vectorized/tfidf-vectors \
                      --output
$HDFS_OUTPUT_DIR/bigdata-canopy-centroids \
                       --clusters
$HDFS_OUTPUT_DIR/bigdata-canopy-centroids/clusters-0 \
....

- The Mahout in Action book assumes that the reader can easily compile and
run the programs. (which I am unable to). Please see this text in chapter 7
of the book on KMeans.

*"7.3.3  Analyzing the output*
Compile and run the code in listing 7.2 using your favorite IDE or do it
from the com-
mand line. Make sure you add all the Mahout dependency JAR files to the
classpath.
Because our set of data is small, you’ll get the following output in a
matter of seconds:"


In other words, I really want to write a simple java clustering program
(say using Kmeans), compile and run from command-line....just like any
other normal java program. I am unable to do this simple stuff now. Any
step-by-step instructions on this would give me a good start. At this
stage, I need a little spoon-feeding.

Appreciate your help,
PD


On Sat, Jan 14, 2012 at 6:23 PM, Lance Norskog <goksron@gmail.com> wrote:

> Ah! A command-line invocation works from maven. mahout/bin/mahout is a
> shell scripts which wraps up a bunch of handy things and runs java for
> you. You can just say from the top level (if your class has a main):
>
> bin/mahout org.apache.mahout.package.class arg1 arg2 ... argN
>
> The problem with 'java -cp' is that the Maven repository downloader
> parks every jar in a separate directory. 'mvn' has a wrapper that runs
> java apps. Look at the mvn calls in this page:
>
>
> http://www.lucidimagination.com/search/link?url=https://cwiki.apache.org/confluence/display/MAHOUT/RecommendationExamples
>
> On Sat, Jan 14, 2012 at 5:04 PM, Periya.Data <periya.data@gmail.com>
> wrote:
> > Thanks. About renaming packages -> I wanted to experiment with modified
> > code at a later time and do not want to change the original. I am
> > building/compiling from a different place.
> >
> > Also, as a newbie, I thought I would know exactly what is needed to run
> > KMeans if I "gradually" build up my pom.xml ..rather than take what is
> > already there which might have a lot of unnecessary  modules packaged up.
> >
> > Finally, a sample command line for execution will be helpful. "java -cp
> > ...". I shall try the universal job file as well.
> >
> > Thanks for your feedback,
> > PD.
> >
> > On Sat, Jan 14, 2012 at 3:09 PM, Lance Norskog <goksron@gmail.com>
> wrote:
> >
> >> I conflated two different things: 1) what you said, and 2) a newbie
> >> will have a much easier time trying out the MIA code against the 0.5
> >> release.
> >>
> >> On Sat, Jan 14, 2012 at 3:35 AM, Sean Owen <srowen@gmail.com> wrote:
> >> > I don't think this has anything to do with using 0.5 vs 0.6 per se.
> >> > All of this surgery is unnecessary. You simply need to use the .job
> >> > files, which package all dependencies into one .jar, rather than
> >> > individual jars.
> >> >
> >> > utils is now integration.
> >> >
> >> > You should not need to rename packages, not sure what you mean there.
> >> >
> >> > Sean
> >> >
> >> > On Sat, Jan 14, 2012 at 4:21 AM, Lance Norskog <goksron@gmail.com>
> >> wrote:
> >> >> The code for Mahout In Action is coded against the Mahout 0.5
> release.
> >> >> The trunk has changed a lot since then. You can change your pom.xml
> >> >> dependencies to Mahout 0.5 and it should work better.
> >> >>
> >> >> You should start with this file, then add your changes.
> >> >>
> >> >>
> >>
> https://github.com/tdunning/MiA/blob/12a0a53757ba49142ab69f94c002ff21650cb3f0/MiA/pom.xml
> >> >>
> >> >> Lance
> >> >>
> >> >> On Thu, Jan 12, 2012 at 8:07 PM, Periya.Data <periya.data@gmail.com>
> >> wrote:
> >> >>> Hi,
> >> >>>    I am new to Mahout and began exploring the clustering examples.
I
> >> >>> basically took the example code of SimpleKMeansClustering (from
> Mahout
> >> in
> >> >>> Action) and trying to run it. The following is what I did :
> >> >>>
> >> >>> 1 - made sure I renamed the package name in the java file
> >> appropriately.
> >> >>> 2 - made sure hadoop is running (in pseudo-distributde mode).
> >> >>> 3 - mvn clean install. My pom.xml file is pasted in the bottom
of
> this
> >> >>> email. The result is as follows:
> >> >>>
> >> >>> pd@PeriyaData:~/Mahout/clustering/target$ ls -l
> >> >>> total 28
> >> >>> drwxrwxr-x 3 pd pd 4096 2012-01-12 19:51 classes
> >> >>> -rw-rw-r-- 1 pd pd 5173 2012-01-12 19:51
> clustering-1.0-SNAPSHOT.jar
> >> >>> drwxrwxr-x 4 pd pd 4096 2012-01-12 19:51 generated-sources
> >> >>> drwxrwxr-x 2 pd pd 4096 2012-01-12 19:51 maven-archiver
> >> >>> drwxrwxr-x 2 pd pd 4096 2012-01-12 19:51 surefire-reports
> >> >>> drwxrwxr-x 3 pd pd 4096 2012-01-12 19:51 test-classes
> >> >>> pd@PeriyaData:~/Mahout/clustering/target$
> >> >>>
> >> >>> 3 - Trying to run it by "java -classpath..." etc. Note...my
> classpath
> >> does
> >> >>> not have mahout-utils.jar. It is missing in my build.
> >> >>>
> >> >>> pd@PeriyaData:~/Mahout/clustering/target/classes$ *java -cp
> >> >>>
> >>
> ../clustering-1.0-SNAPSHOT.jar:~/CDH3/mahout/core/target/classes:~/CDH3/mahout/core/target/mahout-core-0.6-SNAPSHOT.jar:~/CDH3/mahout/math/target/mahout-math-0.6-SNAPSHOT.jar
> >> >>> hw.mahout.kmeans.SimpleKMeansClustering *
> >> >>> Exception in thread "main" java.lang.NoClassDefFoundError:
> >> >>> org/apache/mahout/common/distance/DistanceMeasure
> >> >>> Caused by: java.lang.ClassNotFoundException:
> >> >>> org.apache.mahout.common.distance.DistanceMeasure
> >> >>>    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >> >>>    at java.security.AccessController.doPrivileged(Native Method)
> >> >>>    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >> >>>    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >> >>>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> >> >>>    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> >> >>> Could not find the main class:
> hw.mahout.kmeans.SimpleKMeansClustering.
> >> >>> Program will exit.
> >> >>> pd@PeriyaData:~/Mahout/clustering/target/classes$
> >> >>>
> >> >>> =============================
> >> >>> questions:
> >> >>>
> >> >>>
> >> >>>   1. I do not have mahout-utils.jar file ...for some strange
> reason. I
> >> am
> >> >>>   using Mahout 0.6. I tried recompiling Mahout twice..using MVN
> clean
> >> >>>   install. Still I do not see / cannot find mahout-utils-0.6.jar.
> >> Perhaps
> >> >>>   that is a problem. I have mahout-core, mahout-examples and
> >> mahout-math.
> >> >>>   2. Is the command syntax "java -cp ..." correct in step 3? Please
> >> advise.
> >> >>>   3. Is my pom.xml is sufficient to for this build? Please note
> that in
> >> >>>   pom.xml, I have mahout core and others as 0.5 version. For some
> >> strange
> >> >>>   reason, if I have 0.6, maven build fails and complains that 4
> >> artifacts are
> >> >>>   missing - mahout-core, mahout-math, mahout-utils and
> mahout-examples
> >> jar
> >> >>>   files. Is there a fix this?
> >> >>>
> >> >>>
> >> >>> ==================
> >> >>>
> >> >>> pom.xml
> >> >>>
> >> >>> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="
> >> >>> http://www.w3.org/2001/XMLSchema-instance"
> >> >>>  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
> >> >>> http://maven.apache.org/maven-v4_0_0.xsd">
> >> >>>  <modelVersion>4.0.0</modelVersion>
> >> >>>
> >> >>>  <parent>
> >> >>>    <artifactId>mahout</artifactId>
> >> >>>    <groupId>org.apache.mahout</groupId>
> >> >>>    <version>0.4</version>
> >> >>>  </parent>
> >> >>>
> >> >>>
> >> >>>  <groupId>hw.mahout.kmeans</groupId>
> >> >>>  <artifactId>clustering</artifactId>
> >> >>>  <packaging>jar</packaging>
> >> >>>  <version>1.0-SNAPSHOT</version>
> >> >>>  <name>clustering</name>
> >> >>>  <url>http://maven.apache.org</url>
> >> >>>
> >> >>> <dependencies>
> >> >>>    <dependency>
> >> >>>      <groupId>org.apache.mahout</groupId>
> >> >>>      <artifactId>mahout-core</artifactId>
> >> >>>      <version>0.5</version>
> >> >>>    </dependency>
> >> >>>
> >> >>>    <dependency>
> >> >>>      <groupId>org.apache.mahout</groupId>
> >> >>>      <artifactId>mahout-core</artifactId>
> >> >>>      <type>test-jar</type>
> >> >>>      <scope>test</scope>
> >> >>>      <version>0.5</version>
> >> >>>    </dependency>
> >> >>>
> >> >>>    <dependency>
> >> >>>      <groupId>org.apache.mahout</groupId>
> >> >>>      <artifactId>mahout-math</artifactId>
> >> >>>      <version>0.5</version>
> >> >>>    </dependency>
> >> >>>
> >> >>>    <dependency>
> >> >>>      <groupId>org.apache.mahout</groupId>
> >> >>>      <artifactId>mahout-math</artifactId>
> >> >>>      <type>test-jar</type>
> >> >>>      <scope>test</scope>
> >> >>>      <version>0.5</version>
> >> >>>    </dependency>
> >> >>>
> >> >>>    <dependency>
> >> >>>      <groupId>org.apache.mahout</groupId>
> >> >>>      <artifactId>mahout-utils</artifactId>
> >> >>>      <version>0.5</version>
> >> >>>    </dependency>
> >> >>>
> >> >>>    <dependency>
> >> >>>      <groupId>org.apache.mahout</groupId>
> >> >>>      <artifactId>mahout-examples</artifactId>
> >> >>>      <version>0.5</version>
> >> >>>    </dependency>
> >> >>>
> >> >>>    <dependency>
> >> >>>      <groupId>com.google.guava</groupId>
> >> >>>      <artifactId>guava</artifactId>
> >> >>>      <version>r03</version>
> >> >>>    </dependency>
> >> >>>
> >> >>>    <dependency>
> >> >>>      <groupId>org.apache.thrift</groupId>
> >> >>>      <artifactId>libthrift</artifactId>
> >> >>>      <version>0.6.1</version>
> >> >>>    </dependency>
> >> >>>
> >> >>>    <dependency>
> >> >>>      <groupId>org.slf4j</groupId>
> >> >>>      <artifactId>slf4j-log4j12</artifactId>
> >> >>>      <version>1.5.11</version>
> >> >>>    </dependency>
> >> >>>
> >> >>>    <dependency>
> >> >>>      <groupId>org.apache.hadoop</groupId>
> >> >>>      <artifactId>zookeeper</artifactId>
> >> >>>      <version>3.3.1</version>
> >> >>>    </dependency>
> >> >>>
> >> >>>    <dependency>
> >> >>>      <groupId>org.twitter4j</groupId>
> >> >>>      <artifactId>twitter4j-stream</artifactId>
> >> >>>      <version>2.2.3</version>
> >> >>>    </dependency>
> >> >>>
> >> >>>    <dependency>
> >> >>>        <groupId>commons-io</groupId>
> >> >>>        <artifactId>commons-io</artifactId>
> >> >>>        <version>2.0.1</version>
> >> >>>        <type>jar</type>
> >> >>>        <scope>compile</scope>
> >> >>>    </dependency>
> >> >>>
> >> >>>    <dependency>
> >> >>>      <groupId>commons-logging</groupId>
> >> >>>      <artifactId>commons-logging</artifactId>
> >> >>>      <version>1.1.1</version>
> >> >>>    </dependency>
> >> >>>
> >> >>>  </dependencies>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> <!--
> >> >>>  <build>
> >> >>>    <plugins>
> >> >>>      <plugin>
> >> >>>        <groupId>org.apache.maven.plugins</groupId>
> >> >>>        <artifactId>maven-compiler-plugin</artifactId>
> >> >>>        <version>2.3.2</version>
> >> >>>        <configuration>
> >> >>>          <encoding>UTF-8</encoding>
> >> >>>          <source>1.6</source>
> >> >>>          <target>1.6</target>
> >> >>>          <optimize>true</optimize>
> >> >>>        </configuration>
> >> >>>      </plugin>
> >> >>>      <plugin>
> >> >>>        <groupId>org.apache.maven.plugins</groupId>
> >> >>>        <artifactId>maven-antrun-plugin</artifactId>
> >> >>>        <version>1.6</version>
> >> >>>      </plugin>
> >> >>>      <plugin>
> >> >>>        <groupId>org.apache.maven.plugins</groupId>
> >> >>>        <artifactId>maven-resources-plugin</artifactId>
> >> >>>        <version>2.4.3</version>
> >> >>>        <configuration>
> >> >>>          <encoding>UTF-8</encoding>
> >> >>>        </configuration>
> >> >>>      </plugin>
> >> >>>
> >> >>>      <plugin>
> >> >>>        <groupId>org.apache.maven.plugins</groupId>
> >> >>>        <artifactId>maven-assembly-plugin</artifactId>
> >> >>>        <executions>
> >> >>>          <execution>
> >> >>>            <id>job</id>
> >> >>>            <phase>package</phase>
> >> >>>            <goals>
> >> >>>              <goal>single</goal>
> >> >>>            </goals>
> >> >>>            <configuration>
> >> >>>              <descriptors>
> >> >>>                <descriptor>src/main/assembly/job.xml</descriptor>
> >> >>>              </descriptors>
> >> >>>            </configuration>
> >> >>>          </execution>
> >> >>>          <execution>
> >> >>>            <id>my-jar-with-dependencies</id>
> >> >>>            <phase>package</phase>
> >> >>>            <goals>
> >> >>>              <goal>single</goal>
> >> >>>            </goals>
> >> >>>            <configuration>
> >> >>>              <descriptorRefs>
> >> >>>                <descriptorRef>jar-with-dependencies</descriptorRef>
> >> >>>              </descriptorRefs>
> >> >>>            </configuration>
> >> >>>          </execution>
> >> >>>        </executions>
> >> >>>      </plugin>
> >> >>>    </plugins>
> >> >>>  </build>
> >> >>>
> >> >>> -->
> >> >>>
> >> >>> </project>
> >> >>>
> >> >>>
> >> >>>
> >> >>> Thanks very much,
> >> >>>
> >> >>> PD.
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Lance Norskog
> >> >> goksron@gmail.com
> >>
> >>
> >>
> >> --
> >> Lance Norskog
> >> goksron@gmail.com
> >>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message