pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig Macdonald <cra...@dcs.gla.ac.uk>
Subject OutOfMemory on DISTINCT
Date Fri, 14 Dec 2007 13:07:31 GMT
Hi All,

I have been trying a really simple DISTINCT operator on a 20MB set of 
URLs (hadoop cluster of 6 nodes - Java VM heap  is 1000MB each). Any 
idea what's going wrong here?

I cant see this being a problem the ongoing spill stuff, because the 
dataset is pretty small!

The node logs dont give much other information either!

Thanks in advance.

Craig


urls = LOAD 
'file:/users/tr.craigm/Blogs08/sourceBlogs/blogger.com/recent-updates/all_13122007.txt';
Y = DISTINCT urls;
store Y 'distincUrls'

<snip>

2007-12-14 12:55:38,999 [main] INFO  org.apache.pig - Pig progress = 28%
2007-12-14 12:55:43,030 [main] INFO  org.apache.pig - Pig progress = 29%
2007-12-14 13:00:25,230 [main] ERROR org.apache.pig - Error message from 
task (map) tip_200712070754_0025_m_000000 java.lang.OutOfMemoryError: 
Java heap space
 java.lang.OutOfMemoryError: Java heap space
 java.lang.OutOfMemoryError: Java heap space
 java.lang.OutOfMemoryError: Java heap space

2007-12-14 13:00:25,288 [main] ERROR org.apache.pig - Error message from 
task (map) tip_200712070754_0025_m_000001 java.lang.OutOfMemoryError: 
Java heap space
 java.lang.OutOfMemoryError: Java heap space
 java.lang.OutOfMemoryError: Java heap space

2007-12-14 13:00:25,295 [main] ERROR org.apache.pig - Error message from 
task (reduce) tip_200712070754_0025_r_000000
Job failed
grunt>  

Mime
View raw message