pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig Macdonald <cra...@dcs.gla.ac.uk>
Subject Re: OutOfMemory on DISTINCT
Date Tue, 18 Dec 2007 10:50:00 GMT
Hello,

wc -l gives
3014571

- so shouldn't be loaded as a single tuple by Pig.

C

Utkarsh Srivastava wrote:
> This is really strange since your job is running out of memory on the 
> map side. This could happen if the input file had no newlines (so that 
> Pig tries to load the whole data set as a tuple). But even then, your 
> data is only 20M.
>
> Utkarsh
>
> On Dec 14, 2007, at 5:07 AM, Craig Macdonald wrote:
>
>> Hi All,
>>
>> I have been trying a really simple DISTINCT operator on a 20MB set of 
>> URLs (hadoop cluster of 6 nodes - Java VM heap  is 1000MB each). Any 
>> idea what's going wrong here?
>>
>> I cant see this being a problem the ongoing spill stuff, because the 
>> dataset is pretty small!
>>
>> The node logs dont give much other information either!
>>
>> Thanks in advance.
>>
>> Craig
>>
>>
>> urls = LOAD 
>> 'file:/users/tr.craigm/Blogs08/sourceBlogs/blogger.com/recent-updates/all_13122007.txt';

>>
>> Y = DISTINCT urls;
>> store Y 'distincUrls'
>>
>> <snip>
>>
>> 2007-12-14 12:55:38,999 [main] INFO  org.apache.pig - Pig progress = 28%
>> 2007-12-14 12:55:43,030 [main] INFO  org.apache.pig - Pig progress = 29%
>> 2007-12-14 13:00:25,230 [main] ERROR org.apache.pig - Error message 
>> from task (map) tip_200712070754_0025_m_000000 
>> java.lang.OutOfMemoryError: Java heap space
>> java.lang.OutOfMemoryError: Java heap space
>> java.lang.OutOfMemoryError: Java heap space
>> java.lang.OutOfMemoryError: Java heap space
>>
>> 2007-12-14 13:00:25,288 [main] ERROR org.apache.pig - Error message 
>> from task (map) tip_200712070754_0025_m_000001 
>> java.lang.OutOfMemoryError: Java heap space
>> java.lang.OutOfMemoryError: Java heap space
>> java.lang.OutOfMemoryError: Java heap space
>>
>> 2007-12-14 13:00:25,295 [main] ERROR org.apache.pig - Error message 
>> from task (reduce) tip_200712070754_0025_r_000000
>> Job failed
>> grunt>
>


Mime
View raw message