pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig Macdonald <cra...@dcs.gla.ac.uk>
Subject Re: OutOfMemory on DISTINCT
Date Fri, 21 Dec 2007 15:59:15 GMT
Correction - Pig/Hadoop doesnt crash.

However the output of DISTINCT is strange for this file, but works ok 
for my small sample (eg 10 URLs).

Uktrash, can you check that the output of the job when you run it is in 
fact "real URLs" that were in the input data?
It is not when I run it.

Thanks

Craig

Craig Macdonald wrote:
> Yup, works ok for  mapred.child.java.opts set to -Xmx600m.
> So 200meg is a bit small for even simple Pig jobs.
>
> [ I hadn't appreciated the task tracker is a different Java VM to the 
> jobs]
>
> Thanks for help
>
> Craig
>
>
> Ted Dunning wrote:
>> This is REAL low for some applications.
>>
>>
>> On 12/20/07 6:09 AM, "Craig Macdonald" <craigm@dcs.gla.ac.uk> wrote:
>>
>>  
>>> I assume the memory given to tasks is defined as
>>>     mapred.child.java.opts
>>> which the default value is -Xmx200m
>>>  (see hadoop-default.xml)
>>>
>>> Does this seem to low for this kind of job?
>>>     
>>
>>   
>


Mime
View raw message