hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kaveh minooie <ka...@plutoz.com>
Subject Re: updated to 1.2.1, map completed percentage keeps oscillating
Date Wed, 14 Aug 2013 18:33:11 GMT
Thanks for the hint, thou I am not sure if this is exactly the case. 
when I checked the log files, among other things, I did get these two 
memory errors which I wanted to ask you if you could tell me the 
difference between them:

2013-08-13 19:17:51,401 ERROR org.apache.nutch.protocol.httpclient.Http: 
Failed with the following error:
java.lang.OutOfMemoryError: Java heap space

and

013-08-13 19:17:35,977 ERROR org.apache.nutch.fetcher.FetcherJob: 
Unexpected error for http://www.brighton.com/category/20/1/terms
java.lang.OutOfMemoryError: GC overhead limit exceeded


But it wasn't because the mapred.child.java.opts was being overwritten 
or anything. I could see the parameter in 'ps ax' out put on the node 
that was running the job. Also when I changed the mapred-site.xml to use 
the new properties ( mapred.{map|reduce}.child.java.opts ) instead, I 
started to get these:

13/08/14 10:41:22 INFO mapred.JobClient:  map 0% reduce 0%
13/08/14 10:41:28 INFO mapred.JobClient: Task Id : 
attempt_201308141038_0001_m_000002_0, Status : FAILED
java.lang.Throwable: Child Error
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

attempt_201308141038_0001_m_000002_0: Error: Could not find or load main 
class
13/08/14 10:41:31 INFO mapred.JobClient: Task Id : 
attempt_201308141038_0001_r_000031_0, Status : FAILED
java.lang.Throwable: Child Error
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

attempt_201308141038_0001_r_000031_0: Error: Could not find or load main 
class
13/08/14 10:41:38 INFO mapred.JobClient: Task Id : 
attempt_201308141038_0001_m_000002_1, Status : FAILED
java.lang.Throwable: Child Error
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

attempt_201308141038_0001_m_000002_1: Error: Could not find or load main 
class
13/08/14 10:41:41 INFO mapred.JobClient: Task Id : 
attempt_201308141038_0001_r_000031_1, Status : FAILED
java.lang.Throwable: Child Error
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)


now, to the best of my understanding it wasn't due to an typo or 
anything in the config files. I and since this tasks spawned and failed 
very quickly I wasn't able to get to the node to see the actual java 
command that was being executed which bring me to this question:

is there a way to have either linux or hadoop to log all the commands ( 
tasks, command line, anything ) for either everybody or a particular user?


thanks


On 08/13/2013 06:31 PM, Adam Muise wrote:
> I'm assuming you are trying the same job with the same data as before.
> Try taking a look at the job output for the mappers in the JobTracker.
> Likely you will see some failures and probably a stack trace. When I see
> this after an upgrade, it's usually because the default child opts JVM
> memory size was overwritten. Check the error but review your settings too.
>
> {map|reduce}.child.java.opts
>
> <property>
> <name>mapred.map.child.java.opts</name>
> <value>
>   -Xmx512M -Djava.library.path=/home/mycompany/lib -verbose:gc
> -Xloggc:/tmp/@taskid@.gc
>   -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.ssl=false
> </value>
> </property>
>
> <property>
> <name>mapred.reduce.child.java.opts</name>
> <value>
>   -Xmx1024M -Djava.library.path=/home/mycompany/lib -verbose:gc
> -Xloggc:/tmp/@taskid@.gc
>   -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.ssl=false
> </value>
> </property>
>
>
>
>
> On Tue, Aug 13, 2013 at 9:03 PM, kaveh minooie <kaveh@plutoz.com
> <mailto:kaveh@plutoz.com>> wrote:
>
>     Hi everyone I recently updated my cluster to 1.2.1 and now the
>     percentage of compeleted map tasks while the job is running keeps
>     changing:
>
>     13/08/13 16:53:01 INFO mapred.JobClient: Running job:
>     job_201308131452_0007
>     13/08/13 16:53:02 INFO mapred.JobClient:  map 0% reduce 0%
>     13/08/13 16:53:19 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:53:22 INFO mapred.JobClient:  map 34% reduce 0%
>     13/08/13 16:53:25 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:53:49 INFO mapred.JobClient:  map 44% reduce 0%
>     13/08/13 16:53:52 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:54:02 INFO mapred.JobClient:  map 38% reduce 0%
>     13/08/13 16:54:05 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:54:14 INFO mapred.JobClient:  map 44% reduce 0%
>     13/08/13 16:54:17 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:54:26 INFO mapred.JobClient:  map 24% reduce 0%
>     13/08/13 16:54:29 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:54:53 INFO mapred.JobClient:  map 24% reduce 0%
>     13/08/13 16:54:56 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:55:05 INFO mapred.JobClient:  map 32% reduce 0%
>     13/08/13 16:55:08 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:55:17 INFO mapred.JobClient:  map 20% reduce 0%
>     13/08/13 16:55:20 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:55:32 INFO mapred.JobClient:  map 4% reduce 0%
>     13/08/13 16:55:35 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:55:47 INFO mapred.JobClient:  map 19% reduce 0%
>     13/08/13 16:55:50 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:56:02 INFO mapred.JobClient:  map 46% reduce 0%
>     13/08/13 16:56:06 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:56:36 INFO mapred.JobClient:  map 29% reduce 0%
>     13/08/13 16:56:39 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:57:07 INFO mapred.JobClient:  map 48% reduce 0%
>     13/08/13 16:57:10 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:58:16 INFO mapred.JobClient:  map 39% reduce 0%
>     13/08/13 16:58:20 INFO mapred.JobClient:  map 2% reduce 0%
>     13/08/13 16:58:23 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:58:32 INFO mapred.JobClient:  map 44% reduce 0%
>     13/08/13 16:58:35 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:58:50 INFO mapred.JobClient:  map 18% reduce 0%
>     13/08/13 16:58:53 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:59:08 INFO mapred.JobClient:  map 16% reduce 0%
>     13/08/13 16:59:11 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:59:42 INFO mapred.JobClient:  map 18% reduce 0%
>     13/08/13 16:59:45 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 16:59:54 INFO mapred.JobClient:  map 11% reduce 0%
>     13/08/13 16:59:57 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 17:00:09 INFO mapred.JobClient:  map 33% reduce 0%
>     13/08/13 17:00:12 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 17:00:24 INFO mapred.JobClient:  map 39% reduce 0%
>     13/08/13 17:00:27 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 17:00:51 INFO mapred.JobClient:  map 37% reduce 0%
>     13/08/13 17:00:54 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 17:01:12 INFO mapred.JobClient:  map 50% reduce 0%
>     13/08/13 17:01:15 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 17:01:39 INFO mapred.JobClient:  map 44% reduce 0%
>     13/08/13 17:01:42 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 17:01:54 INFO mapred.JobClient:  map 36% reduce 0%
>     13/08/13 17:01:57 INFO mapred.JobClient:  map 51% reduce 0%
>     13/08/13 17:02:24 INFO mapred.JobClient:  map 11% reduce 0%
>     13/08/13 17:02:27 INFO mapred.JobClient:  map 51% reduce 0%
>
>
>
>     this is the output of one job with one map task. there was no
>     failure. the map task was not re-spawned. it just ran and finished
>     on the node on which it was started, but this is the output. what gives?
>
>     --
>     Kaveh Minooie
>
>
>
>
> --
> *
> *
> *
> *
> *Adam Muise*
> Solution Engineer
> *Hortonworks*
> amuise@hortonworks.com <mailto:amuise@hortonworks.com>
> 416-417-4037
>
> Hortonworks - Develops, Distributes and Supports Enterprise Apache
> Hadoop. <http://hortonworks.com/>
>
> Hortonworks Virtual Sandbox <http://hortonworks.com/sandbox>
>
> Hadoop: Disruptive Possibilities by Jeff Needham
> <http://hortonworks.com/resources/?did=72&cat=1>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is
> confidential, privileged and exempt from disclosure under applicable
> law. If the reader of this message is not the intended recipient, you
> are hereby notified that any printing, copying, dissemination,
> distribution, disclosure or forwarding of this communication is strictly
> prohibited. If you have received this communication in error, please
> contact the sender immediately and delete it from your system. Thank You.

-- 
Kaveh Minooie

Mime
View raw message