mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shivani Rao <>
Subject R and Mahout integration
Date Wed, 17 Nov 2010 22:25:07 GMT
I am an R user and now using Mahout for ML algorithms on big datasets that are
out of reach of R.
R has hadoopstreaming package and I was wondering if Mahout and R have an
interface that has been developed.

My question arises from the fact that the lucence vectors/sparse matrices
created by Mahout are unintelligible if there is no way to access them in R

I have just tested using Apache Mahout for building an Latent dirichlet
allocation model on a corpus of 30 documents. I did not have Hadoop installed on
the system thats why a local execution of the Mahout yielded the resulting
model. I would like to access the model parameters, as in the estimated \alpha,
\beta, \Phi, \Theta

How can I access these?

<Mahout bin location>/mahout lda -i <tf-vectors location>/tf-vectors -o
<lda-out-dir> -k 4-v 27

I can see that <lda-out-dir> has folder <state-i> for each iteration(i presume)
of the learning algorithm. Each <state-i> has a single file part-r-0000 which I
do not know how to access.

Do I need to use HBASE to be able to acesss the data generated by Mahout?

If my naive questions annoy you, I apologize, I am new to Mahout.


View raw message