hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Hadoop scripting when to use dfs -put
Date Tue, 14 Feb 2012 14:01:29 GMT
For the sake of http://xkcd.com/979/, and since this was cross posted,
Håvard managed to solve this specific issue via Joey's response at
https://groups.google.com/a/cloudera.org/group/cdh-user/msg/c55760868efa32e2

2012/2/14 Håvard Wahl Kongsgård <haavard.kongsgaard@gmail.com>:
> My environment heap size varies from 18GB to 2GB
> in mapred-site.xml mapred.child.java.opts = -Xmx512M
>
> System Ubuntu 10.04 LTS, java-6-sun-1.6.0.26, ,latest cloudera version of hadoop
>
>
> This log from the tasklog
> Original exception was:
> java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
>        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:376)
>        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
>        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
>        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
>        at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: java.lang.OutOfMemoryError: Java heap space
>        at org.apache.hadoop.typedbytes.TypedBytesInput.readRawBytes(TypedBytesInput.java:212)
>        at org.apache.hadoop.typedbytes.TypedBytesInput.readRaw(TypedBytesInput.java:152)
>        at org.apache.hadoop.streaming.io.TypedBytesOutputReader.readKeyValue(TypedBytesOutputReader.java:51)
>        at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:418)
>
>
> I don't have a recursive loop like while or something else
>
> my dumbo code
>
> multi_tree() is just a simple function
>
> where the error handling is
> try:
> except:
> pass
>
> def mapper(key, value):
>   v = value.split(" ")[0]
>   yield multi_tree(v),1
>
>
> if __name__ == "__main__":
>   import dumbo
>   dumbo.run(mapper)
>
>
> -Håvard
>
>
> On Mon, Feb 13, 2012 at 8:52 PM, Rohit <rohit@hortonworks.com> wrote:
>> Hi,
>>
>> What threw the heap error? Was it the Java VM, or the shell environment?
>>
>> It would be good to look at free RAM memory on your system before and after you ran
the script as well, to see if your system is running low on memory.
>>
>> Are you using a recursive loop in your script?
>>
>> Thanks,
>> Rohit
>>
>>
>> Rohit Bakhshi
>>
>>
>>
>>
>>
>> www.hortonworks.com (http://www.hortonworks.com/)
>>
>>
>>
>>
>>
>> On Monday, February 13, 2012 at 10:39 AM, Håvard Wahl Kongsgård wrote:
>>
>>> Hi, I originally posted this on the dumbo forum, but it's more a
>>> general scripting hadoop issue.
>>>
>>> When testing a simple script that created some local files
>>> and then copied them to hdfs
>>> with os.system("hadoop dfs -put /home/havard/bio_sci/file.json
>>> /tmp/bio_sci/file.json")
>>>
>>> the tasks fail with out of heap memory. The files are tiny, and I have
>>> tried increasing the
>>> heap size. When skipping the hadoop dfs -put, the tasks do not fail.
>>>
>>> Is it wrong to use hadoop dfs -put inside running a script with
>>> hadoop? Should I instead
>>> transfer the files at the end with a combiner, or simply mount hdfs
>>> locally and write directly to hdfs? Any general suggestions?
>>>
>>>
>>> --
>>> Håvard Wahl Kongsgård
>>> NTNU
>>>
>>> http://havard.security-review.net/
>>
>
>
>
> --
> Håvard Wahl Kongsgård
> NTNU
>
> http://havard.security-review.net/



-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about

Mime
View raw message