mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Harrington <ch...@heystaks.com>
Subject Re: cvb vectordump
Date Fri, 19 Apr 2013 12:30:06 GMT
Just ran vectordump over the output from cub but I have no idea what I'm looking at

{1.0:0.0689751034234147,0hu:0.052798138507741114,06:0.046108327846619585,091:0.04079964524901706,1:0.03488226667358313,10g:0.03471651100042406,07:0.03051583712303273,10.30am:0.029957963431693112,1171:0.028424194208528646,10.4.10:0.028173810240271588}

Can someone give me an explanation of the above 

In the Mahout in Action book there was a table which displayed topic with top terms, how would
I go from the above to something like that. i.e. 
topic 0 -> term1, term2 term3….termN
topic 1 -> term1, term2 term3….termN
etc.


On 19 Apr 2013, at 10:19, Chris Harrington wrote:

> Found the issue it was the folder I gave it for outputting the matrix in the rowed command,
for cvb I gave it the  ./contentDataDir/matrix as the matrix location instead I should have
supplied ./contentDataDir/martrix/matrix
> 
> On 17 Apr 2013, at 12:46, Chris Harrington wrote:
> 
>> So I've got 0.8 now but I'm running into an error,
>> 
>> ../../workspace2/trunk/bin/mahout seqdirectory -i ./contentDataDir/output-content-segment
-o ./contentDataDir/sequenced
>> 
>> ../../workspace2/trunk/bin/mahout seq2sparse -i ./contentDataDir/sequenced -o ./contentDataDir/sparseVectors
--namedVector -wt tf
>> 
>> ../../workspace2/trunk/bin/mahout rowid -i ./contentDataDir/sparseVectors/tf-vectors/
-o ./contentDataDir/matrix
>> 
>> ../../workspace2/trunk/bin/mahout cvb -i ./contentDataDir/matrix -o cvb-output -k
100 -x 1 -dict ./contentDataDir/sparseVectors/dictionary.file-0 -dt cvb-topic-doc -mt cvb-topic-model
>> 
>> but the cvb command hits a class cast exception
>> 
>> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.mahout.math.VectorWritable
>> 	at org.apache.mahout.clustering.lda.cvb.CachingCVB0Mapper.map(CachingCVB0Mapper.java:55)
>> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
>> 	at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> 
>> I thought the seq2sparse took care of turning hadoop Text into mahouts VectorWritable.
Where have I gone wrong?
>> 
>> 
>> 
>> On 16 Apr 2013, at 14:45, Jake Mannix wrote:
>> 
>>> You should just be building off of trunk (0.8-snapshot) in which case you
>>> should be working just fine.
>>> 
>>> 
>>> On Tue, Apr 16, 2013 at 6:43 AM, Chris Harrington <chris@heystaks.com>wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I've been trying to get the vector dumper to work on the output from cub
>>>> but it's throwing lots of errors.
>>>> 
>>>> I found several old mails on the mailing list regrading this issue
>>>> specifically this
>>>> 
>>>> 
>>>> http://mail-archives.apache.org/mod_mbox/mahout-user/201211.mbox/%3CCAHSfFsy2oWRuzwVzGW57LRYaJ+LuudNu-W5EO0wnV_ff=UY6fg@mail.gmail.com%3E
>>>> 
>>>> That thread is a bit old so I was wondering was there a patch or anything
>>>> to fix it or do I need to use the 0.8-snapshot?
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> -jake
>> 
> 


Mime
View raw message