mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Traupman (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAHOUT-666) DistributedSparseMatrix should clean up after itself when doing times(Vector) and timesSquared(Vector)
Date Tue, 12 Apr 2011 02:25:05 GMT
DistributedSparseMatrix should clean up after itself when doing times(Vector) and timesSquared(Vector)
------------------------------------------------------------------------------------------------------

                 Key: MAHOUT-666
                 URL: https://issues.apache.org/jira/browse/MAHOUT-666
             Project: Mahout
          Issue Type: Bug
          Components: Math
    Affects Versions: 0.5
         Environment: Linux x86_64 2.6.18, Mac OS 10.6 64-bit, Hadoop 0.20.2, Java 1.6
            Reporter: Jonathan Traupman
            Priority: Minor
             Fix For: 0.5


The directories created during the times() and timesSquared() methods in DistributedSparseMatrix
leave behind a lot of cruft. While the individual files are tagged with deleteOnExit, but
the directories are not. Also, but not deleting them until JVM exit, a job that does repeated
matrix/vector multiplies, like DistributedLanczosSolver, creates a lot of temp files that
stick around for the whole run, even though the results they contain are read once and then
never again. 

Our cluster admins enforce both file count and size quotas, so since 5 temp files/directories
are created on each iteration of DistributedLanczosSolver, we're constantly bumping into the
quota with large SVDs. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message