mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Radek Maciaszek <radek.macias...@gmail.com>
Subject Re: Tranforming data for k-means analysis
Date Thu, 09 Sep 2010 11:49:23 GMT
Hi Jeff,

Phew! I managed to wrap vectors with NamedVector. I needed as well to
slightly modify the ClusterDumper to make it aware of the NamedVector and in
order to get both the userId and clusterId in the output. The most important
thing is that it seems to work! I will stress test it with more data and
will let you know the results.

One thing which I noticed is that instead of expected 600 clusters I can see
only 175 in the clusteredPoints. So far I tested it with about 81k vectors.
Is it possible or it should not happen and is caused by some error?

I was planning to use Canopy for preprocessing, however I am not sure how to
select the parameters for canopy in order to get for example 600
clusters. It is rather difficult for me to estimate the distance between
points with thousands of dimensions. Are you familiar with some rules of
thumb which can help here? I tried various parameters but I've always got
just one cluster no matter what I tried.

Jeff, many thanks for all your help! Rui, as promised I will write up a
quick tutorial in few weeks time - my MSc has a priority at the moment.

Best,
Radek

On 8 September 2010 17:53, Jeff Eastman <jdog@windwardsolutions.com> wrote:

>  Hi Radek,
>
> The clustering code is pretty stable but we have been having some unit test
> failures in unrelated code that may frustrate you. I suggest you can do a
> trunk checkout and then run "mvn clean install -DskipTests=true" to get a
> build without running all the tests. After that, I suggest running
> "examples/bin/build-reuters.sh" which will get you a dataset that you can
> explore using the mahout command line API. If you are already past that and
> still are having problems let me know and I will try to help.
>
> Jeff
>
>
>
> On 9/8/10 1:52 AM, rmx wrote:
>
>> Hi Radek,
>> If you could post a tutorial, it would be fantastic.
>> I am a Machine Learning researcher without enough java programming skills
>> to
>> dig the code.
>> I found Mahout potential really impressive and if I could manage to work
>> it
>> I would be up to convince the rest of research group to use it.
>>
>> Hi Jeff, yes the problems I got was from the non truck version. 2 or 3
>> weeks
>> ago I tried to install Truck but I got some errors on the installation
>> tests. I will try to do it again, since probably there is a new version.
>>
>> Thanks
>> Rui
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message