mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: Failure to run Clustering example
Date Wed, 29 Apr 2009 17:20:46 GMT
Hi Shashi,

That does sound like a JDK version problem. Most jobs require an initial 
step to get the input into the correct vector format to use the 
clustering code. The 
/Mahout/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/canopy/Job.java

calls an InputDriver that does that for the syntheticcontrol examples. 
You would need to do something similar to massage your data into Mahout 
Vector format before you can run the clustering job of your choosing.

Jeff

Shashikant Kore wrote:
> Thanks for the response, Grant.
>
> Upgrading Hadoop didn't really help. Now, I am not able to launch even
> the Namenode, JobTracker, ... as I am getting same error. I suspect
> version conflict somewhere as there are two JDK version on the box. I
> will try it out on another box which has only JDK 6.
>
> >From the documentation of clustering, it is not clear how to get the
> vectors from text (or html) files. I suppose, you can get TF-IDF
> values by indexing this content with Lucene. How does one proceed from
> there? Any pointers on that are appreciated.
>
> --shashi
>
> On Tue, Apr 28, 2009 at 8:40 PM, Grant Ingersoll <gsingers@apache.org> wrote:
>   
>> On Apr 28, 2009, at 6:01 AM, Shashikant Kore wrote:
>>
>>     
>>> Hi,
>>>
>>> Initially, I got the version number error at the beginning. I found
>>> that JDK version was 1.5. It has been upgraded it to 1.6. Now
>>> JAVA_HOME points to /usr/java/jdk1.6.0_13/  and I am using Hadoop
>>> 0.18.3.
>>>
>>> 1. What could possibly be wrong? I checked the Hadoop script. Value of
>>> JAVA_HOME is correct (ie 1.6). Is it possible that somehow it is still
>>> using 1.5?
>>>       
>> I'm going to guess the issue is that you need Hadoop 0.19.
>>     
>>> 2. The last step the clustering tutorial says "Get the data out of
>>> HDFS and have a look." Can you please point me to the documentation of
>>> Hadoop about how to read this data?
>>>       
>> http://hadoop.apache.org/core/docs/current/quickstart.html towards the
>> bottom.  It shows some of the commands you can use w/ HDFS.  -get, -cat,
>> etc.
>>
>>
>> -Grant
>>
>>     
>
>
>   


Mime
View raw message