mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vineet yadav <vineet.yadav.i...@gmail.com>
Subject Re: Need help: beginner
Date Wed, 02 Feb 2011 22:40:33 GMT
Hi sarath,
    Can you  post the exact argument  you passed to call the job ?
Thanks
Vineet Yadav

On Thu, Feb 3, 2011 at 3:37 AM, sharath jagannath <
sharathjagannath@gmail.com> wrote:

> Hey All,
>
> It is again me with probably another stupid query but I am having hard time
> getting going.
> I have installed/configured both mahout and hadoop now. Ran all the
> examples
> in quickstart.
> Now I wanted to start writing my own code to cluster data and wondering
> where to start.
>
> I should accept that I am new to hadoop too, went through their wordcount
> quickstart app.
>
> I am taken aback with the vastness of mahout and need some assistance to
> start with.
>
>
> My data stream format <data can from network/disk>:
>
>
> String1 label1:rating/relevance label2:r label3:r
>
> String2 label1:rating/relevance label2:r label3:r label4:r
>
> String3 label1:rating/relevance label2:r
>
>
> From my reading I have learnt I need to convert this text file to tf-idf
> vector and need to use one of the Vectorizer Class.
>
> I thought starting with the cluster example is a good place. imported
> entire
> mahout distribution as a maven project in eclipse and executed job.java
> under cluster.syntheticcontrol.kmeans.
>
> but I got this exception. I am not sure why I encountered it. I have set
> JAVA_HOME, HADOOP_HOME, HADOOP_CONF_DIR but the app is still searching the
> data in the current folder.
>
>
> Feb 2, 2011 1:40:27 PM org.slf4j.impl.JCLLoggerAdapter info
>
> INFO: Running with default arguments
>
> Feb 2, 2011 1:41:12 PM org.slf4j.impl.JCLLoggerAdapter info
>
> INFO: Preparing Input
>
> Feb 2, 2011 1:42:27 PM org.apache.hadoop.metrics.jvm.JvmMetrics init
>
> INFO: Initializing JVM Metrics with processName=JobTracker, sessionId=
>
> Feb 2, 2011 1:45:35 PM org.apache.hadoop.mapred.JobClient
> configureCommandLineOptions
>
> WARNING: Use GenericOptionsParser for parsing the arguments. Applications
> should implement Tool for the same.
>
> Feb 2, 2011 1:45:35 PM org.apache.hadoop.mapred.JobClient
> configureCommandLineOptions
>
> WARNING: No job jar file set.  User classes may not be found. See
> JobConf(Class) or JobConf#setJar(String).
>
> Exception in thread "main"
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does
> not exist: file:/Users/sjagannath/mahout-distribution-0.4/examples/testdata
>
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(
> FileInputFormat.java:224)
>
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(
> FileInputFormat.java:241)
>
> at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
>
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
>
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
>
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
>
> at org.apache.mahout.clustering.conversion.InputDriver.runJob(
> InputDriver.java:108)
>
> at
> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(Job.java:133
> )
>
> at
> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:58
> )
>
>
>
> Given this:
> I need to know what is happening here, Where I should start to vectorize my
> data.
>
> --
> Thanks,
> Sharath Jagannath
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message