spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject running lda in spark throws exception
Date Mon, 28 Dec 2015 03:26:43 GMT
I ran my lda example in a yarn 2.6.2 cluster with spark 1.5.2.
it throws exception in line:   Matrix topics = ldaModel.topicsMatrix();
But in yarn job history ui, it's successful. What's wrong with it?
I submit job with
.bin/spark-submit --class Myclass \
    --master yarn-client \
    --num-executors 2 \
    --driver-memory 4g \
    --executor-memory 4g \
    --executor-cores 1 \


My codes:

   corpus.cache();


    // Cluster the documents into three topics using LDA

    DistributedLDAModel ldaModel = (DistributedLDAModel) new
LDA().setOptimizer("em").setMaxIterations(iterNumber).setK(topicNumber).run(corpus);


    // Output topics. Each is a distribution over words (matching word
count vectors)

    System.out.println("Learned topics (as distributions over vocab of
" + ldaModel.vocabSize()

        + " words):");

   //Line81, exception here:    Matrix topics = ldaModel.topicsMatrix();

    for (int topic = 0; topic < topicNumber; topic++) {

      System.out.print("Topic " + topic + ":");

      for (int word = 0; word < ldaModel.vocabSize(); word++) {

        System.out.print(" " + topics.apply(word, topic));

      }

      System.out.println();

    }


    ldaModel.save(sc.sc(), modelPath);


Exception in thread "main" java.lang.IndexOutOfBoundsException:
(1025,0) not in [-58,58) x [-100,100)

        at breeze.linalg.DenseMatrix$mcD$sp.update$mcD$sp(DenseMatrix.scala:112)

        at org.apache.spark.mllib.clustering.DistributedLDAModel$$anonfun$topicsMatrix$1.apply(LDAModel.scala:534)

        at org.apache.spark.mllib.clustering.DistributedLDAModel$$anonfun$topicsMatrix$1.apply(LDAModel.scala:531)

        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)

        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)

        at org.apache.spark.mllib.clustering.DistributedLDAModel.topicsMatrix$lzycompute(LDAModel.scala:531)

        at org.apache.spark.mllib.clustering.DistributedLDAModel.topicsMatrix(LDAModel.scala:523)

        at com.mobvoi.knowledgegraph.textmining.lda.ReviewLDA.main(ReviewLDA.java:81)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:606)

        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)

        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)

        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)

        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)

        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

15/12/23 00:01:16 INFO spark.SparkContext: Invoking stop() from shutdown hook

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message