spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-12488) LDA describeTopics() Generates Invalid Term IDs
Date Thu, 03 Nov 2016 23:48:59 GMT

     [ https://issues.apache.org/jira/browse/SPARK-12488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joseph K. Bradley resolved SPARK-12488.
---------------------------------------
          Resolution: Fixed
            Assignee: Xiangrui Meng
       Fix Version/s: 1.6.1
                      2.0.0
                      1.5.3
                      1.4.2
    Target Version/s:   (was: 2.1.0)

> LDA describeTopics() Generates Invalid Term IDs
> -----------------------------------------------
>
>                 Key: SPARK-12488
>                 URL: https://issues.apache.org/jira/browse/SPARK-12488
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.5.2
>            Reporter: Ilya Ganelin
>            Assignee: Xiangrui Meng
>             Fix For: 1.4.2, 1.5.3, 2.0.0, 1.6.1
>
>
> When running the LDA model, and using the describeTopics function, invalid values appear
in the termID list that is returned:
> The below example generates 10 topics on a data set with a vocabulary of 685.
> {code}
>     // Set LDA parameters
>     val numTopics = 10
>     val lda = new LDA().setK(numTopics).setMaxIterations(10)
>     val ldaModel = lda.run(docTermVector)
>     val distModel = ldaModel.asInstanceOf[org.apache.spark.mllib.clustering.DistributedLDAModel]
> {code}
> {code}
> scala> ldaModel.describeTopics()(0)._1.sorted.reverse
> res40: Array[Int] = Array(2064860663, 2054149956, 1991041659, 1986948613, 1962816105,
1858775243, 1842920256, 1799900935, 1792510791, 1792371944, 1737877485, 1712816533, 1690397927,
1676379181, 1664181296, 1501782385, 1274389076, 1260230987, 1226545007, 1213472080, 1068338788,
1050509279, 714524034, 678227417, 678227086, 624763822, 624623852, 618552479, 616917682, 551612860,
453929488, 371443786, 183302140, 58762039, 42599819, 9947563, 617, 616, 615, 612, 603, 597,
596, 595, 594, 593, 592, 591, 590, 589, 588, 587, 586, 585, 584, 583, 582, 581, 580, 579,
578, 577, 576, 575, 574, 573, 572, 571, 570, 569, 568, 567, 566, 565, 564, 563, 562, 561,
560, 559, 558, 557, 556, 555, 554, 553, 552, 551, 550, 549, 548, 547, 546, 545, 544, 543,
542, 541, 540, 539, 538, 537, 536, 535, 534, 533, 532, 53...
> {code}
> {code}
> scala> ldaModel.describeTopics()(0)._1.sorted
> res41: Array[Int] = Array(-2087809139, -2001127319, -1979718998, -1833443915, -1811530305,
-1765302237, -1668096260, -1527422175, -1493838005, -1452770216, -1452508395, -1452502074,
-1452277147, -1451720206, -1450928740, -1450237612, -1448730073, -1437852514, -1420883015,
-1418557080, -1397997340, -1397995485, -1397991169, -1374921919, -1360937376, -1360533511,
-1320627329, -1314475604, -1216400643, -1210734882, -1107065297, -1063529036, -1062984222,
-1042985412, -1009109620, -951707740, -894644371, -799531743, -627436045, -586317106, -563544698,
-326546674, -174108802, -155900771, -80887355, -78916591, -26690004, 0, 1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 4...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message