mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paritosh Ranjan <pran...@xebia.com>
Subject Re: Writing java program for performing kmeans clustering on reuters dataset instead of ./mahout <seqdirectory | seq2sparse | kmeans| clusterdump > ,Steps to Follow
Date Wed, 04 Jan 2012 08:54:09 GMT
Do a "mvn clean install -Dmaven.test.skip" on parent pom/directory.

On 04-01-2012 14:21, rahul raghavendhra wrote:
> On Tue, Jan 3, 2012 at 4:05 PM, praveenesh kumar<praveenesh@gmail.com>wrote:
>
>> Have you tried this link ?
>>
>> http://shuyo.wordpress.com/2011/02/14/mahout-development-environment-with-maven-and-eclipse-2/
>>
>>
>>> It is telling you how to import mahout in action examples in eclipse.
>>> Just add Hadoop and mahout dependencies in pom.xml and there is a small
>>> Mahout in action example to run Kmeans clustering, you can use that
>   Thanks for ur reply.. i tried this for setting mahout in ecclipse. i got
> error in mahout-core, mahout-buildtools, mahout-examples, mahout-math etc..
> got error in all when importing them from file system(i have done mvn
> eclipse:eclipse and added maven-repo too ) please help.. why this error ?
>
>
> thanks in advance..please help..
>
> ./rahul
>
>
> On Tue, Jan 3, 2012 at 4:01 PM, Paritosh Ranjan<pranjan@xebia.com>  wrote:
>>> I think mahout-core ( and its internal dependencies ) can do most of what
>>> you need.
>>>
>>> You will have to create your vectors yourself and write to HDFS.
>>>
>>> Then use KMeansDriver's run method to do clustering.
>>>
>>> Then use ClusterOutputPostProcessor to separate out vectors belonging to
>>> different clusters in their specific directories.
>>>
>>> Then write some code to read the cluster specific clusters.
>>>
>>> PS : Reading and writing from HDFS is simple.
>>>
>>>
>>> On 03-01-2012 15:57, rahul raghavendhra wrote:
>>>
>>>> I am new to mahout, i have svn the trunk and installed it using mvn..
>> now
>>>> i
>>>> wish to write a java program(instead of the shell script
>>>> build-reuters.sh/cluster-**reuters.sh<
>> http://build-reuters.sh/cluster-reuters.sh>)
>>>> that performs a kmeans clustering by
>>>> calling the methods or by creating instance (if possible) in the classes
>>>> which convert the dataset into sequence file then to sparse and then
>> apply
>>>> kmeans   and then cluster dump and display kmeans..
>>>>
>>>> i.e , to perform kmeans clustering on reuters dataset without using
>>>> ./mahout<seqdirectory | seq2sparse | kmeans| clusterdump>
>>>>
>>>> what r the jars needed to be imported..
>>>>
>>>> Can that program can be developed using eclipse and run it there ?
>>>>
>>>> kindly help, what are the steps to follow ?
>>>>
>>>> thanks in advance
>>>>
>>>> ./rahul
>>>>
>>>>
>>>>
>>>> -----
>>>> No virus found in this message.
>>>> Checked by AVG - www.avg.com
>>>> Version: 10.0.1416 / Virus Database: 2109/4119 - Release Date: 01/02/12
>>>>
>>>
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1416 / Virus Database: 2109/4121 - Release Date: 01/03/12


Mime
View raw message