spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Qi Dai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-13289) Word2Vec generate infinite distances when numIterations>5
Date Mon, 21 Mar 2016 19:57:25 GMT

    [ https://issues.apache.org/jira/browse/SPARK-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204983#comment-15204983
] 

Qi Dai commented on SPARK-13289:
--------------------------------

I'm trying to test it with the current master branch and nightly build with yarn, but spark
always fail to start with the java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.history.YarnHistoryService
issue. Does anyone have any idea about this? Looks like no one is reporting this issue. Should
I raise another new issue about this?

The stack is like this:
java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.history.YarnHistoryService
  at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:348)
  at org.apache.spark.util.Utils$.classForName(Utils.scala:177)
  at org.apache.spark.scheduler.cluster.SchedulerExtensionServices$$anonfun$start$5.apply(SchedulerExtensionService.scala:109)
  at org.apache.spark.scheduler.cluster.SchedulerExtensionServices$$anonfun$start$5.apply(SchedulerExtensionService.scala:108)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
  at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
  at scala.collection.AbstractTraversable.map(Traversable.scala:104)
  at org.apache.spark.scheduler.cluster.SchedulerExtensionServices.start(SchedulerExtensionService.scala:108)
  at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.start(YarnSchedulerBackend.scala:81)
  at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
  at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
  at org.apache.spark.repl.Main$.createSparkContext(Main.scala:89)
  ... 48 elided
java.lang.NullPointerException
  at org.apache.spark.sql.SQLContext$.createListenerAndUI(SQLContext.scala:1036)
  at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:91)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
  at org.apache.spark.repl.Main$.createSQLContext(Main.scala:99)
  ... 48 elided
<console>:13: error: not found: value sqlContext
       import sqlContext.implicits._
              ^
<console>:13: error: not found: value sqlContext
       import sqlContext.sql
              ^

> Word2Vec generate infinite distances when numIterations>5
> ---------------------------------------------------------
>
>                 Key: SPARK-13289
>                 URL: https://issues.apache.org/jira/browse/SPARK-13289
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.6.0
>         Environment: Linux, Scala
>            Reporter: Qi Dai
>              Labels: features
>
> I recently ran some word2vec experiments on a cluster with 50 executors on some large
text dataset but find out that when number of iterations is larger than 5 the distance between
words will be all infinite. My code looks like this:
> val text = sc.textFile("/project/NLP/1_biliion_words/train").map(_.split(" ").toSeq)
> import org.apache.spark.mllib.feature.{Word2Vec, Word2VecModel}
> val word2vec = new Word2Vec().setMinCount(25).setVectorSize(96).setNumPartitions(99).setNumIterations(10).setWindowSize(5)
> val model = word2vec.fit(text)
> val synonyms = model.findSynonyms("who", 40)
> for((synonym, cosineSimilarity) <- synonyms) {
>   println(s"$synonym $cosineSimilarity")
> }
> The results are: 
> to Infinity
> and Infinity
> that Infinity
> with Infinity
> said Infinity
> it Infinity
> by Infinity
> be Infinity
> have Infinity
> he Infinity
> has Infinity
> his Infinity
> an Infinity
> ) Infinity
> not Infinity
> who Infinity
> I Infinity
> had Infinity
> their Infinity
> were Infinity
> they Infinity
> but Infinity
> been Infinity
> I tried many different datasets and different words for finding synonyms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message