Hello
I am an R user and now using Mahout for ML algorithms on big datasets that are
out of reach of R.
R has hadoopstreaming package and I was wondering if Mahout and R have an
interface that has been developed.
My question arises from the fact that the lucence vectors/sparse matrices
created by Mahout are unintelligible if there is no way to access them in R
I have just tested using Apache Mahout for building an Latent dirichlet
allocation model on a corpus of 30 documents. I did not have Hadoop installed on
the system thats why a local execution of the Mahout yielded the resulting
model. I would like to access the model parameters, as in the estimated \alpha,
\beta, \Phi, \Theta
How can I access these?
<Mahout bin location>/mahout lda -i <tf-vectors location>/tf-vectors -o
<lda-out-dir> -k 4-v 27
I can see that <lda-out-dir> has folder <state-i> for each iteration(i presume)
of the learning algorithm. Each <state-i> has a single file part-r-0000 which I
do not know how to access.
Do I need to use HBASE to be able to acesss the data generated by Mahout?
If my naive questions annoy you, I apologize, I am new to Mahout.
Regards,
Shivani
|