mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Memory problems with KMeans
Date Thu, 13 Nov 2008 19:57:48 GMT
How much memory does your laptop have?

On Nov 13, 2008, at 11:53 AM, Philippe Lamarche wrote:

> Hi,
>
> I am using KMeans to do some text clustering and I get into memory  
> problems.
> As of now, I only tried it on a laptop in pseudo distributed master/ 
> slave
> mode.
>
> This is on Hadoop branch-0.19. The "texttovector.jar" contains a  
> hacked
> version of the syntheticcontrol KMeans example, the only difference  
> is in
> the first input phase.
>
> Is this memory error "normal"? I am running with export  
> HADOOP_OPTS="-server
> -XX:+UseParallelGC -XX:ParallelGCThreads=4 -XX:NewSize=1G - 
> XX:MaxNewSize=1G
> -XX:-UseGCOverheadLimit"
>
> In my understanding, the "-XX:-UseGCOverheadLimit" should remove the
> GCOverhead "feature".
>
> Any ideas?
>
>
>
>
> hadoop@philippe-vaio:/usr/local/hadoop$ bin/hadoop jar
> /home/philippe/workspace/MTI830/dist/texttovector.jar
> org.apache.mahout.clustering.text.kmeans.Job testallmti/vectors/part*
> testallclusteroutput1  
> org.apache.mahout.utils.TanimotoDistanceMeasure 1.001
> .001 .000005 10
> 08/11/13 11:37:23 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the  
> same.
> 08/11/13 11:37:23 INFO mapred.FileInputFormat: Total input paths to  
> process
> : 1
> 08/11/13 11:37:23 INFO mapred.JobClient: Running job:  
> job_200811131133_0007
> 08/11/13 11:37:24 INFO mapred.JobClient:  map 0% reduce 0%
> 08/11/13 11:37:37 INFO mapred.JobClient:  map 31% reduce 0%
> 08/11/13 11:37:42 INFO mapred.JobClient:  map 63% reduce 0%
> 08/11/13 11:37:45 INFO mapred.JobClient:  map 83% reduce 0%
> 08/11/13 11:37:50 INFO mapred.JobClient:  map 100% reduce 0%
> 08/11/13 11:37:51 INFO mapred.JobClient: Job complete:  
> job_200811131133_0007
> 08/11/13 11:37:51 INFO mapred.JobClient: Counters: 7
> 08/11/13 11:37:51 INFO mapred.JobClient:   File Systems
> 08/11/13 11:37:51 INFO mapred.JobClient:     HDFS bytes read=118875664
> 08/11/13 11:37:51 INFO mapred.JobClient:     HDFS bytes  
> written=146866785
> 08/11/13 11:37:51 INFO mapred.JobClient:   Job Counters
> 08/11/13 11:37:51 INFO mapred.JobClient:     Launched map tasks=2
> 08/11/13 11:37:51 INFO mapred.JobClient:     Data-local map tasks=2
> 08/11/13 11:37:51 INFO mapred.JobClient:   Map-Reduce Framework
> 08/11/13 11:37:51 INFO mapred.JobClient:     Map input records=1702
> 08/11/13 11:37:51 INFO mapred.JobClient:     Map input bytes=118836254
> 08/11/13 11:37:51 INFO mapred.JobClient:     Map output records=1702
> 08/11/13 11:37:51 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the  
> same.
> 08/11/13 11:37:51 INFO mapred.FileInputFormat: Total input paths to  
> process
> : 2
> 08/11/13 11:37:51 INFO mapred.JobClient: Running job:  
> job_200811131133_0008
> 08/11/13 11:37:52 INFO mapred.JobClient:  map 0% reduce 0%
> 08/11/13 11:38:07 INFO mapred.JobClient:  map 4% reduce 0%
> 08/11/13 11:38:12 INFO mapred.JobClient:  map 9% reduce 0%
> 08/11/13 11:38:17 INFO mapred.JobClient:  map 11% reduce 0%
> 08/11/13 11:38:22 INFO mapred.JobClient:  map 13% reduce 0%
> 08/11/13 11:38:27 INFO mapred.JobClient:  map 15% reduce 0%
> 08/11/13 11:38:32 INFO mapred.JobClient:  map 16% reduce 0%
> 08/11/13 11:38:37 INFO mapred.JobClient:  map 18% reduce 0%
> 08/11/13 11:38:42 INFO mapred.JobClient:  map 19% reduce 0%
> 08/11/13 11:38:47 INFO mapred.JobClient:  map 21% reduce 0%
> 08/11/13 11:38:52 INFO mapred.JobClient:  map 22% reduce 0%
> 08/11/13 11:38:57 INFO mapred.JobClient:  map 23% reduce 0%
> 08/11/13 11:39:01 INFO mapred.JobClient:  map 24% reduce 0%
> 08/11/13 11:39:06 INFO mapred.JobClient:  map 25% reduce 0%
> 08/11/13 11:39:12 INFO mapred.JobClient:  map 26% reduce 0%
> 08/11/13 11:39:17 INFO mapred.JobClient:  map 27% reduce 0%
> 08/11/13 11:39:27 INFO mapred.JobClient:  map 28% reduce 0%
> 08/11/13 11:39:37 INFO mapred.JobClient:  map 29% reduce 0%
> 08/11/13 11:39:47 INFO mapred.JobClient:  map 30% reduce 0%
> 08/11/13 11:39:57 INFO mapred.JobClient:  map 31% reduce 0%
> 08/11/13 11:40:07 INFO mapred.JobClient:  map 32% reduce 0%
> 08/11/13 11:40:17 INFO mapred.JobClient:  map 33% reduce 0%
> 08/11/13 11:40:32 INFO mapred.JobClient:  map 34% reduce 0%
> 08/11/13 11:40:42 INFO mapred.JobClient:  map 35% reduce 0%
> 08/11/13 11:40:52 INFO mapred.JobClient:  map 36% reduce 0%
> 08/11/13 11:41:07 INFO mapred.JobClient:  map 37% reduce 0%
> 08/11/13 11:41:17 INFO mapred.JobClient:  map 38% reduce 0%
> 08/11/13 11:41:33 INFO mapred.JobClient:  map 39% reduce 0%
> 08/11/13 11:41:38 INFO mapred.JobClient:  map 40% reduce 0%
> 08/11/13 11:41:53 INFO mapred.JobClient:  map 41% reduce 0%
> 08/11/13 11:42:03 INFO mapred.JobClient:  map 42% reduce 0%
> 08/11/13 11:42:17 INFO mapred.JobClient:  map 43% reduce 0%
> 08/11/13 11:42:32 INFO mapred.JobClient:  map 44% reduce 0%
> 08/11/13 11:42:42 INFO mapred.JobClient:  map 45% reduce 0%
> 08/11/13 11:42:57 INFO mapred.JobClient:  map 46% reduce 0%
> 08/11/13 11:43:13 INFO mapred.JobClient:  map 47% reduce 0%
> 08/11/13 11:43:33 INFO mapred.JobClient:  map 48% reduce 0%
> 08/11/13 11:43:48 INFO mapred.JobClient:  map 49% reduce 0%
> 08/11/13 11:44:08 INFO mapred.JobClient:  map 50% reduce 0%
> 08/11/13 11:44:28 INFO mapred.JobClient:  map 51% reduce 0%
> 08/11/13 11:44:53 INFO mapred.JobClient:  map 52% reduce 0%
> 08/11/13 11:45:23 INFO mapred.JobClient:  map 53% reduce 0%
> 08/11/13 11:46:03 INFO mapred.JobClient:  map 54% reduce 0%
> 08/11/13 11:46:10 INFO mapred.JobClient:  map 28% reduce 0%
> 08/11/13 11:46:10 INFO mapred.JobClient: Task Id :
> attempt_200811131133_0008_m_000000_0, Status : FAILED
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>    at
> org.apache.mahout.matrix.DenseVector$Iterator.next(DenseVector.java: 
> 184)
>    at
> org.apache.mahout.matrix.DenseVector$Iterator.next(DenseVector.java: 
> 172)
>    at
> org 
> .apache 
> .mahout 
> .utils.TanimotoDistanceMeasure.distance(TanimotoDistanceMeasure.java: 
> 73)
>    at
> org 
> .apache 
> .mahout.clustering.canopy.Canopy.emitPointToNewCanopies(Canopy.java: 
> 181)
>    at
> org 
> .apache.mahout.clustering.canopy.CanopyMapper.map(CanopyMapper.java: 
> 42)
>    at
> org 
> .apache.mahout.clustering.canopy.CanopyMapper.map(CanopyMapper.java: 
> 34)
>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>    at org.apache.hadoop.mapred.Child.main(Child.java:155)
>
> hadoop@philippe-vaio:/usr/local/hadoop$

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











Mime
View raw message