spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pun <punintended...@gmail.com>
Subject How to run MLlib's word2vec in CBOW mode?
Date Thu, 28 Sep 2017 13:55:45 GMT
Hello,
My understanding is that word2vec can be ran in two modes:
continuous bag-of-words (CBOW) (order of words does not matter) 
 continuous skip-gram (order of words matters)
I would like to run the *CBOW* implementation from Spark's MLlib, but it is
not clear to me from the documentation and their example how to do it. 
This is the example listed on their page.From:
https://spark.apache.org/docs/2.1.0/mllib-feature-extraction.html#example
import org.apache.spark.mllib.feature.{Word2Vec, Word2VecModel}val input =
sc.textFile("data/mllib/sample_lda_data.txt").map(line => line.split("
").toSeq)val word2vec = new Word2Vec()val model = word2vec.fit(input)val
synonyms = model.findSynonyms("1", 5)for((synonym, cosineSimilarity) <-
synonyms) {  println(s"$synonym $cosineSimilarity")}
*My questions:*
Which of the two modes does this example use?
Do you know how I can run the model in the CBOW mode?
Thanks in advance!



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Mime
View raw message