mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Arabic K-mean clustering
Date Fri, 18 Feb 2011 21:41:36 GMT
Aren't the Hudson build messages about this?

On Fri, Feb 18, 2011 at 9:46 AM, Matthew Runo <matthew.runo@gmail.com> wrote:
> This brings up a question I have:
>
> How often is trunk pushed up to the apache maven snapshot repo?
>
>       <repository>
>            <snapshots>
>                <enabled>true</enabled>
>            </snapshots>
>            <name>Apache Snapshots</name>
>            <id>apache-snapshots</id>
>            <url>http://repository.apache.org/snapshots</url>
>        </repository>
>
>
> <dependency>
>            <groupId>org.apache.mahout</groupId>
>            <artifactId>mahout</artifactId>
>            <version>0.5-SNAPSHOT</version>
>        </dependency>
>
> Thanks!
>
> Matthew
>
> On Thu, Feb 17, 2011 at 9:44 PM, Lance Norskog <goksron@gmail.com> wrote:
>> Waleed: a fix for this was checked in on January 27. Are you using the
>> trunk, or the 0.4 release? Most people use the trunk, and they
>> generally recommend it. If you're on the trunk, it is time to do an
>> update to the latest code.
>>
>> Lance
>>
>> On Thu, Feb 17, 2011 at 3:16 PM, Shige Takeda <smtakeda@gmail.com> wrote:
>>> hi, I believe the following bug already addressed the issue:
>>> https://issues.apache.org/jira/browse/MAHOUT-594
>>>
>>> Thanks, -- Shige
>>>
>>> On Thu, Feb 17, 2011 at 3:57 AM, WaleedAzmy <wazmy@tayait.com> wrote:
>>>
>>>>
>>>> Dear All...
>>>>
>>>> I tried to test Mahout K-Mean clustering on Arabic data. But -I think-
>>>> there
>>>> is a problems in encoding...
>>>>
>>>> I tried the following commands:
>>>> =======================
>>>>
>>>> $ ./mahout seqdirectory -i "....\Arabic_data" -o
>>>> "....\ArabicTest\Arabic_data-seqdir" -c UTF-8 -chunk 5
>>>>
>>>> $ ./mahout seq2sparse -i "....\ArabicTest\Arabic_data-seqdir" -o
>>>> "....\ArabicTest\Arabic_data_out-seqdir"
>>>>
>>>> $ ./mahout kmeans -i
>>>> "....\ArabicTest\Arabic_data_out-seqdir\tfidf-vectors/"
>>>> -c "....\ArabicTest\clusters" -o "....\ArabicTest\arabic-kmeans" -x 10 -k
>>>> 20
>>>> -ow
>>>>
>>>> $ ./mahout clusterdump -s "....\ArabicTest\arabic-kmeans\clusters-1" -d
>>>> "....\ArabicTest\Arabic_data_out-seqdir\dictionary.file-0" -dt sequencefile
>>>> -b 100 -n 20
>>>>
>>>>
>>>> The clusterdump generate the following output
>>>> ===================================
>>>>
>>>> o HADOOP_HOME set, running locally
>>>> :VL-1{n=1 c=[24:6.187, 31:5.912, 53:7.643, 69:7.958, 77:8.365, ??:2.260,
>>>> ?????:5.627, ?????:5.627, ??
>>>>        Top Terms:
>>>>                ????                              
     =>
>>>>  11.830205917358398
>>>>                ?????                              
    =>
>>>>  10.808554649353027
>>>>                ???????                            
    =>
>>>>  8.93863296508789
>>>>                ?????                              
    =>
>>>>  8.93863296508789
>>>>                ???????                            
    =>
>>>>  8.93863296508789
>>>>                ???????                            
    =>
>>>>  8.93863296508789
>>>>                77                                
     =>
>>>> 8.365219116210938
>>>>                ????                              
     =>
>>>> 8.365219116210938
>>>>                ??????                              
   =>
>>>> 8.365219116210938
>>>>                ???????????                          
  =>
>>>> 8.365219116210938
>>>>                69                                
     =>
>>>> 7.958374977111816
>>>>                ?????                              
    =>
>>>>  7.6428022384643555
>>>>                53                                
     =>
>>>>  7.6428022384643555
>>>>                ???                                
    =>
>>>>  7.6428022384643555
>>>>                ???                                
    =>
>>>> 7.384960651397705
>>>>                ?????                              
    =>
>>>> 7.384960651397705
>>>>                ?????                              
    =>
>>>> 7.166958332061768
>>>>                24                                
     =>
>>>> 6.186699867248535
>>>>                31                                
     =>
>>>>  5.9121222496032715
>>>>                ?????                              
    =>
>>>> 5.627420902252197
>>>> :VL-104{n=1 c=[??:6.089, ????:5.404, ??????:3.795, ???????:5.915,
>>>> ??????:7.385, ????????:8.939, ?????
>>>>        Top Terms:
>>>>                ????????                            
   =>
>>>>  12.641136169433594
>>>>                ??????                              
   =>
>>>> 9.422260284423828
>>>>                ?????????                            
  =>
>>>>  8.93863296508789
>>>>                ????                              
     =>
>>>>  8.93863296508789
>>>>
>>>>
>>>> ===============================================================
>>>> I think the meaningless (?) is a problem of encoding.... Can anyone help
me
>>>> in this????
>>>>
>>>> Also I want a tutorial describing the command for k-mean clustering and it
>>>> attributes and what is the output of clusterdump represent for?
>>>>
>>>> Thank you....
>>>> --
>>>> View this message in context:
>>>> http://lucene.472066.n3.nabble.com/Arabic-K-mean-clustering-tp2518248p2518248.html
>>>> Sent from the Mahout User List mailing list archive at Nabble.com.
>>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>



-- 
Lance Norskog
goksron@gmail.com
Mime
View raw message