Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EC555200BC3 for ; Fri, 4 Nov 2016 00:49:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E9099160B10; Thu, 3 Nov 2016 23:49:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3D47B160AFF for ; Fri, 4 Nov 2016 00:49:00 +0100 (CET) Received: (qmail 97656 invoked by uid 500); 3 Nov 2016 23:48:59 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 97639 invoked by uid 99); 3 Nov 2016 23:48:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Nov 2016 23:48:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 51A682C0057 for ; Thu, 3 Nov 2016 23:48:59 +0000 (UTC) Date: Thu, 3 Nov 2016 23:48:59 +0000 (UTC) From: "Joseph K. Bradley (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (SPARK-12488) LDA describeTopics() Generates Invalid Term IDs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 03 Nov 2016 23:49:01 -0000 [ https://issues.apache.org/jira/browse/SPARK-12488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-12488. --------------------------------------- Resolution: Fixed Assignee: Xiangrui Meng Fix Version/s: 1.6.1 2.0.0 1.5.3 1.4.2 Target Version/s: (was: 2.1.0) > LDA describeTopics() Generates Invalid Term IDs > ----------------------------------------------- > > Key: SPARK-12488 > URL: https://issues.apache.org/jira/browse/SPARK-12488 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 1.5.2 > Reporter: Ilya Ganelin > Assignee: Xiangrui Meng > Fix For: 1.4.2, 1.5.3, 2.0.0, 1.6.1 > > > When running the LDA model, and using the describeTopics function, invalid values appear in the termID list that is returned: > The below example generates 10 topics on a data set with a vocabulary of 685. > {code} > // Set LDA parameters > val numTopics = 10 > val lda = new LDA().setK(numTopics).setMaxIterations(10) > val ldaModel = lda.run(docTermVector) > val distModel = ldaModel.asInstanceOf[org.apache.spark.mllib.clustering.DistributedLDAModel] > {code} > {code} > scala> ldaModel.describeTopics()(0)._1.sorted.reverse > res40: Array[Int] = Array(2064860663, 2054149956, 1991041659, 1986948613, 1962816105, 1858775243, 1842920256, 1799900935, 1792510791, 1792371944, 1737877485, 1712816533, 1690397927, 1676379181, 1664181296, 1501782385, 1274389076, 1260230987, 1226545007, 1213472080, 1068338788, 1050509279, 714524034, 678227417, 678227086, 624763822, 624623852, 618552479, 616917682, 551612860, 453929488, 371443786, 183302140, 58762039, 42599819, 9947563, 617, 616, 615, 612, 603, 597, 596, 595, 594, 593, 592, 591, 590, 589, 588, 587, 586, 585, 584, 583, 582, 581, 580, 579, 578, 577, 576, 575, 574, 573, 572, 571, 570, 569, 568, 567, 566, 565, 564, 563, 562, 561, 560, 559, 558, 557, 556, 555, 554, 553, 552, 551, 550, 549, 548, 547, 546, 545, 544, 543, 542, 541, 540, 539, 538, 537, 536, 535, 534, 533, 532, 53... > {code} > {code} > scala> ldaModel.describeTopics()(0)._1.sorted > res41: Array[Int] = Array(-2087809139, -2001127319, -1979718998, -1833443915, -1811530305, -1765302237, -1668096260, -1527422175, -1493838005, -1452770216, -1452508395, -1452502074, -1452277147, -1451720206, -1450928740, -1450237612, -1448730073, -1437852514, -1420883015, -1418557080, -1397997340, -1397995485, -1397991169, -1374921919, -1360937376, -1360533511, -1320627329, -1314475604, -1216400643, -1210734882, -1107065297, -1063529036, -1062984222, -1042985412, -1009109620, -951707740, -894644371, -799531743, -627436045, -586317106, -563544698, -326546674, -174108802, -155900771, -80887355, -78916591, -26690004, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 4... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org