pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Utkarsh Srivastava <utka...@yahoo-inc.com>
Subject Re: OutOfMemory on DISTINCT
Date Sat, 15 Dec 2007 01:53:56 GMT
This is really strange since your job is running out of memory on the  
map side. This could happen if the input file had no newlines (so  
that Pig tries to load the whole data set as a tuple). But even then,  
your data is only 20M.

Utkarsh

On Dec 14, 2007, at 5:07 AM, Craig Macdonald wrote:

> Hi All,
>
> I have been trying a really simple DISTINCT operator on a 20MB set  
> of URLs (hadoop cluster of 6 nodes - Java VM heap  is 1000MB each).  
> Any idea what's going wrong here?
>
> I cant see this being a problem the ongoing spill stuff, because  
> the dataset is pretty small!
>
> The node logs dont give much other information either!
>
> Thanks in advance.
>
> Craig
>
>
> urls = LOAD 'file:/users/tr.craigm/Blogs08/sourceBlogs/blogger.com/ 
> recent-updates/all_13122007.txt';
> Y = DISTINCT urls;
> store Y 'distincUrls'
>
> <snip>
>
> 2007-12-14 12:55:38,999 [main] INFO  org.apache.pig - Pig progress  
> = 28%
> 2007-12-14 12:55:43,030 [main] INFO  org.apache.pig - Pig progress  
> = 29%
> 2007-12-14 13:00:25,230 [main] ERROR org.apache.pig - Error message  
> from task (map) tip_200712070754_0025_m_000000  
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
>
> 2007-12-14 13:00:25,288 [main] ERROR org.apache.pig - Error message  
> from task (map) tip_200712070754_0025_m_000001  
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
>
> 2007-12-14 13:00:25,295 [main] ERROR org.apache.pig - Error message  
> from task (reduce) tip_200712070754_0025_r_000000
> Job failed
> grunt>


Mime
View raw message