mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sharath jagannath <sharathjagann...@gmail.com>
Subject Re: Need help: beginner
Date Wed, 02 Feb 2011 22:37:34 GMT
Another question to this:
How should I package the entire mahout as jar?
will mvn package enough?

--Sharath

On Wed, Feb 2, 2011 at 2:07 PM, sharath jagannath <
sharathjagannath@gmail.com> wrote:

> Hey All,
>
> It is again me with probably another stupid query but I am having hard time
> getting going.
> I have installed/configured both mahout and hadoop now. Ran all the
> examples in quickstart.
> Now I wanted to start writing my own code to cluster data and wondering
> where to start.
>
> I should accept that I am new to hadoop too, went through their wordcount
> quickstart app.
>
> I am taken aback with the vastness of mahout and need some assistance to
> start with.
>
>
> My data stream format <data can from network/disk>:
>
>
> String1 label1:rating/relevance label2:r label3:r
>
> String2 label1:rating/relevance label2:r label3:r label4:r
>
> String3 label1:rating/relevance label2:r
>
>
> From my reading I have learnt I need to convert this text file to tf-idf
> vector and need to use one of the Vectorizer Class.
>
> I thought starting with the cluster example is a good place. imported
> entire mahout distribution as a maven project in eclipse and executed
> job.java under cluster.syntheticcontrol.kmeans.
>
> but I got this exception. I am not sure why I encountered it. I have set
> JAVA_HOME, HADOOP_HOME, HADOOP_CONF_DIR but the app is still searching the
> data in the current folder.
>
>
> Feb 2, 2011 1:40:27 PM org.slf4j.impl.JCLLoggerAdapter info
>
> INFO: Running with default arguments
>
> Feb 2, 2011 1:41:12 PM org.slf4j.impl.JCLLoggerAdapter info
>
> INFO: Preparing Input
>
> Feb 2, 2011 1:42:27 PM org.apache.hadoop.metrics.jvm.JvmMetrics init
>
> INFO: Initializing JVM Metrics with processName=JobTracker, sessionId=
>
> Feb 2, 2011 1:45:35 PM org.apache.hadoop.mapred.JobClient
> configureCommandLineOptions
>
> WARNING: Use GenericOptionsParser for parsing the arguments. Applications
> should implement Tool for the same.
>
> Feb 2, 2011 1:45:35 PM org.apache.hadoop.mapred.JobClient
> configureCommandLineOptions
>
> WARNING: No job jar file set.  User classes may not be found. See
> JobConf(Class) or JobConf#setJar(String).
>
> Exception in thread "main"
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist:
> file:/Users/sjagannath/mahout-distribution-0.4/examples/testdata
>
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(
> FileInputFormat.java:224)
>
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(
> FileInputFormat.java:241)
>
> at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
>
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779
> )
>
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
>
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
>
> at org.apache.mahout.clustering.conversion.InputDriver.runJob(
> InputDriver.java:108)
>
> at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(
> Job.java:133)
>
> at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(
> Job.java:58)
>
>
>
> Given this:
> I need to know what is happening here, Where I should start to vectorize my
> data.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message