spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yuangang.liu (JIRA)" <j...@apache.org>
Subject [jira] [Reopened] (SPARK-12016) word2vec load model can't use findSynonyms to get words
Date Mon, 30 Nov 2015 14:52:11 GMT

     [ https://issues.apache.org/jira/browse/SPARK-12016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

yuangang.liu reopened SPARK-12016:
----------------------------------

sorry for my late update

> word2vec load model can't use findSynonyms to get words 
> --------------------------------------------------------
>
>                 Key: SPARK-12016
>                 URL: https://issues.apache.org/jira/browse/SPARK-12016
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.5.2
>         Environment: ubuntu 14.04
>            Reporter: yuangang.liu
>
> I use word2vec.fit to train a word2vecModel and then save the model to file system. when
I load the model from file system, I found I can use transform('a') to get a vector, but I
can't use findSynonyms('a', 2) to get some words.
> I use the fellow code to test word2vec
> from pyspark import SparkContext
> from pyspark.mllib.feature import Word2Vec, Word2VecModel
> import os, tempfile
> from shutil import rmtree
> if __name__ == '__main__':
>     sc = SparkContext('local', 'test')
>     sentence = "a b " * 100 + "a c " * 10
>     localDoc = [sentence, sentence]
>     doc = sc.parallelize(localDoc).map(lambda line: line.split(" "))
>     model = Word2Vec().setVectorSize(10).setSeed(42).fit(doc)
>     syms = model.findSynonyms("a", 2)
>     print [s[0] for s in syms]
>     path = tempfile.mkdtemp()
>     model.save(sc, path)
>     sameModel = Word2VecModel.load(sc, path)
>     print model.transform("a") == sameModel.transform("a")
>     syms = sameModel.findSynonyms("a", 2)
>     print [s[0] for s in syms]
>     try:
>         rmtree(path)
>     except OSError:
>         pass
> I got "[u'b', u'c']" when the first printf
> then the “True” and " [u'__class__'] "
> I don't know how to get 'b' or 'c' with sameModel.findSynonyms("a", 2)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message