Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4AC439F14 for ; Sun, 3 Jun 2012 19:22:25 +0000 (UTC) Received: (qmail 69735 invoked by uid 500); 3 Jun 2012 19:22:24 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 69667 invoked by uid 500); 3 Jun 2012 19:22:24 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 69657 invoked by uid 99); 3 Jun 2012 19:22:24 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 03 Jun 2012 19:22:24 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 57460142860 for ; Sun, 3 Jun 2012 19:22:24 +0000 (UTC) Date: Sun, 3 Jun 2012 19:22:24 +0000 (UTC) From: "Hudson (JIRA)" To: dev@mahout.apache.org Message-ID: <1585905699.32171.1338751344359.JavaMail.jiratomcat@issues-vm> In-Reply-To: <757684725.8668.1330634159416.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (MAHOUT-986) OutOfMemoryError in LanczosState by way of SpectralKMeans MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAHOUT-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288240#comment-13288240 ] Hudson commented on MAHOUT-986: ------------------------------- Integrated in Mahout-Quality #1517 (See [https://builds.apache.org/job/Mahout-Quality/1517/]) MAHOUT-986 Remove old LDA implementation from codebase (Revision 1345736) Result = FAILURE ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1345736 Files : * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADocumentTopicMapper.java * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDAInference.java * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDAReducer.java * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDASampler.java * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDAState.java * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDAUtil.java * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDAWordTopicMapper.java * /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java * /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestLDAInference.java * /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java * /mahout/trunk/src/conf/driver.classes.props * /mahout/trunk/src/conf/lda.props * /mahout/trunk/src/conf/ldatopics.props > OutOfMemoryError in LanczosState by way of SpectralKMeans > --------------------------------------------------------- > > Key: MAHOUT-986 > URL: https://issues.apache.org/jira/browse/MAHOUT-986 > Project: Mahout > Issue Type: Improvement > Components: Clustering > Affects Versions: 0.6 > Environment: Ubuntu 11.10 (64-bit) > Reporter: Shannon Quinn > Assignee: Shannon Quinn > Priority: Minor > Fix For: 0.7 > > > Dan Brickley and I have been testing SpectralKMeans with a dbpedia dataset ( http://danbri.org/2012/spectral/dbpedia/ ); effectively, a graph with 4,192,499 nodes. Not surprisingly, the LanczosSolver throws an OutOfMemoryError when it attempts to instantiate a DenseMatrix of dimensions 4192499-by-4192499 (~17.5 trillion double-precision floating point values). Here's the full stack trace: > {quote} > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at org.apache.mahout.math.DenseMatrix.(DenseMatrix.java:50) > at org.apache.mahout.math.decomposer.lanczos.LanczosState.(LanczosState.java:45) > at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:146) > at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:86) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.main(SpectralKMeansDriver.java:53) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > {quote} > Obviously SKM needs a more sustainable and memory-efficient way of performing an eigen-decomposition of the graph laplacian. For those who are more knowledgeable in the linear systems solvers of Mahout than I, can the Lanczos parameters be tweaked to negate the requirement of a full DenseMatrix? Or should SKM move to SSVD instead? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira