Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 45924 invoked from network); 25 Jan 2009 16:54:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 25 Jan 2009 16:54:16 -0000 Received: (qmail 89412 invoked by uid 500); 25 Jan 2009 16:54:09 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 89234 invoked by uid 500); 25 Jan 2009 16:54:09 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 89223 invoked by uid 500); 25 Jan 2009 16:54:09 -0000 Delivered-To: apmail-lucene-hadoop-user@lucene.apache.org Received: (qmail 89220 invoked by uid 99); 25 Jan 2009 16:54:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Jan 2009 08:54:09 -0800 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=FUZZY_CPILL,HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.44.30] (HELO yx-out-2324.google.com) (74.125.44.30) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Jan 2009 16:54:00 +0000 Received: by yx-out-2324.google.com with SMTP id 3so1965797yxj.5 for ; Sun, 25 Jan 2009 08:53:38 -0800 (PST) MIME-Version: 1.0 Received: by 10.150.121.6 with SMTP id t6mr1387836ybc.131.1232902418729; Sun, 25 Jan 2009 08:53:38 -0800 (PST) In-Reply-To: <7e536b1f0901250842l32ae8240secad933cad042a18@mail.gmail.com> References: <7e536b1f0901250842l32ae8240secad933cad042a18@mail.gmail.com> Date: Sun, 25 Jan 2009 17:53:38 +0100 Message-ID: <7e536b1f0901250853h3edc8928r7b8a7b45a0a51674@mail.gmail.com> Subject: Re: Lingering TaskTracker$Child From: Marcus Herou To: hadoop-user@lucene.apache.org Content-Type: multipart/alternative; boundary=000e0cd6ad38b60a990461517636 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd6ad38b60a990461517636 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Some extra info, apparently the child exits with a status of 143. 2009-01-25 17:13:11.110 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0011_m_000005_0 1.0% 5364 docs, 0 errors, 124.7 docs/s 2009-01-25 17:13:11.114 INFO IPC Server handler 1 on 32274 org.apache.hadoop.mapred.TaskTracker - Task attempt_200901251629_0011_m_000005_0 is done. 2009-01-25 17:13:11.116 INFO main org.apache.hadoop.mapred.TaskRunner - Task 'attempt_200901251629_0011_m_000005_0' done. 2009-01-25 17:13:12.644 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0011_m_000005_0 1.0% 5364 docs, 0 errors, 124.7 docs/s 2009-01-25 17:13:12.644 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0011_m_000005_0 Ignoring status-update since task is 'done' 2009-01-25 17:13:24.996 INFO taskCleanup org.apache.hadoop.mapred.TaskTracker - Received 'KillJobAction' for job: job_200901251629_0011 2009-01-25 17:13:24.996 INFO taskCleanup org.apache.hadoop.mapred.TaskRunner - attempt_200901251629_0011_m_000005_0 done; removing files. 2009-01-25 17:47:22.668 WARN Thread-23 org.apache.hadoop.mapred.TaskRunner - attempt_200901251629_0001_m_000006_0 Child Error java.io.IOException: Task process exit with nonzero status of 143. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403) 2009-01-25 17:47:22.669 WARN Thread-23 org.apache.hadoop.mapred.TaskTracker - Error from unknown child task: attempt_200901251629_0001_m_000006_0. Ignored. 2009-01-25 17:47:22.671 WARN Thread-23 org.apache.hadoop.mapred.TaskTracker - Unknown child task finshed: attempt_200901251629_0001_m_000006_0. Ignored. 2009-01-25 17:47:22.713 WARN Thread-79 org.apache.hadoop.mapred.TaskRunner - attempt_200901251629_0002_m_000007_0 Child Error java.io.IOException: Task process exit with nonzero status of 143. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403) 2009-01-25 17:47:22.713 WARN Thread-159 org.apache.hadoop.mapred.TaskRunner - attempt_200901251629_0011_m_000005_0 Child Error java.io.IOException: Task process exit with nonzero status of 143. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403) 2009-01-25 17:47:22.713 WARN Thread-79 org.apache.hadoop.mapred.TaskTracker - Error from unknown child task: attempt_200901251629_0002_m_000007_0. Ignored. 2009-01-25 17:47:22.714 WARN Thread-159 org.apache.hadoop.mapred.TaskTracker - Error from unknown child task: attempt_200901251629_0011_m_000005_0. Ignored. 2009-01-25 17:47:22.714 WARN Thread-79 org.apache.hadoop.mapred.TaskTracker - Unknown child task finshed: attempt_200901251629_0002_m_000007_0. Ignored. 2009-01-25 17:47:22.715 WARN Thread-159 org.apache.hadoop.mapred.TaskTracker - Unknown child task finshed: attempt_200901251629_0011_m_000005_0. Ignored. 2009-01-25 17:47:22.811 WARN Thread-119 org.apache.hadoop.mapred.TaskRunner - attempt_200901251629_0006_m_000007_0 Child Error java.io.IOException: Task process exit with nonzero status of 143. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403) 2009-01-25 17:47:22.811 WARN Thread-119 org.apache.hadoop.mapred.TaskTracker - Error from unknown child task: attempt_200901251629_0006_m_000007_0. Ignored. 2009-01-25 17:47:22.812 WARN Thread-119 org.apache.hadoop.mapred.TaskTracker - Unknown child task finshed: attempt_200901251629_0006_m_000007_0. Ignored. 2009-01-25 17:47:22.960 WARN Thread-89 org.apache.hadoop.mapred.TaskRunner - attempt_200901251629_0003_m_000007_0 Child Error java.io.IOException: Task process exit with nonzero status of 143. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403) 2009-01-25 17:47:22.961 WARN Thread-89 org.apache.hadoop.mapred.TaskTracker - Error from unknown child task: attempt_200901251629_0003_m_000007_0. Ignored. 2009-01-25 17:47:22.962 WARN Thread-89 org.apache.hadoop.mapred.TaskTracker - Unknown child task finshed: attempt_200901251629_0003_m_000007_0. Ignored. 2009-01-25 17:47:23.060 WARN Thread-129 org.apache.hadoop.mapred.TaskRunner - attempt_200901251629_0008_m_000003_0 Child Error java.io.IOException: Task process exit with nonzero status of 143. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403) 2009-01-25 17:47:23.061 WARN Thread-129 org.apache.hadoop.mapred.TaskTracker - Error from unknown child task: attempt_200901251629_0008_m_000003_0. Ignored. 2009-01-25 17:47:23.062 WARN Thread-129 org.apache.hadoop.mapred.TaskTracker - Unknown child task finshed: attempt_200901251629_0008_m_000003_0. Ignored. 2009-01-25 17:47:23.279 WARN Thread-99 org.apache.hadoop.mapred.TaskRunner - attempt_200901251629_0004_m_000007_0 Child Error java.io.IOException: Task process exit with nonzero status of 143. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403) 2009-01-25 17:47:23.279 WARN Thread-99 org.apache.hadoop.mapred.TaskTracker - Error from unknown child task: attempt_200901251629_0004_m_000007_0. Ignored. 2009-01-25 17:47:23.280 WARN Thread-99 org.apache.hadoop.mapred.TaskTracker - Unknown child task finshed: attempt_200901251629_0004_m_000007_0. Ignored. 2009-01-25 17:47:23.360 WARN Thread-139 org.apache.hadoop.mapred.TaskRunner - attempt_200901251629_0009_m_000004_0 Child Error java.io.IOException: Task process exit with nonzero status of 143. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403) 2009-01-25 17:47:23.360 WARN Thread-139 org.apache.hadoop.mapred.TaskTracker - Error from unknown child task: attempt_200901251629_0009_m_000004_0. Ignored. 2009-01-25 17:47:23.361 WARN Thread-139 org.apache.hadoop.mapred.TaskTracker - Unknown child task finshed: attempt_200901251629_0009_m_000004_0. Ignored. 2009-01-25 17:47:23.445 WARN Thread-149 org.apache.hadoop.mapred.TaskRunner - attempt_200901251629_0010_m_000004_0 Child Error java.io.IOException: Task process exit with nonzero status of 143. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403) 2009-01-25 17:47:23.445 WARN Thread-149 org.apache.hadoop.mapred.TaskTracker - Error from unknown child task: attempt_200901251629_0010_m_000004_0. Ignored. 2009-01-25 17:47:23.446 WARN Thread-149 org.apache.hadoop.mapred.TaskTracker - Unknown child task finshed: attempt_200901251629_0010_m_000004_0. Ignored. 2009-01-25 17:47:23.463 WARN Thread-109 org.apache.hadoop.mapred.TaskRunner - attempt_200901251629_0005_m_000007_0 Child Error java.io.IOException: Task process exit with nonzero status of 143. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403) 2009-01-25 17:47:23.464 WARN Thread-109 org.apache.hadoop.mapred.TaskTracker - Error from unknown child task: attempt_200901251629_0005_m_000007_0. Ignored. 2009-01-25 17:47:23.465 WARN Thread-109 org.apache.hadoop.mapred.TaskTracker - Unknown child task finshed: attempt_200901251629_0005_m_000007_0. Ignored. 2009-01-25 17:48:15.315 INFO main org.apache.hadoop.mapred.TaskTracker - LaunchTaskAction: attempt_200901251629_0012_m_000004_0 2009-01-25 17:48:28.088 INFO main org.apache.hadoop.mapred.MapTask - numReduceTasks: 0 2009-01-25 17:48:34.097 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 0.07784176% 347 docs, 0 errors, 69.4 docs/s 2009-01-25 17:48:37.099 INFO IPC Server handler 1 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 0.13955688% 673 docs, 0 errors, 84.1 docs/s 2009-01-25 17:48:40.100 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 0.20145722% 1003 docs, 0 errors, 91.2 docs/s 2009-01-25 17:48:43.102 INFO IPC Server handler 1 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 0.26416454% 1343 docs, 0 errors, 95.9 docs/s 2009-01-25 17:48:46.104 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 0.32754657% 1673 docs, 0 errors, 98.4 docs/s 2009-01-25 17:48:49.105 INFO IPC Server handler 1 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 0.39229122% 2025 docs, 0 errors, 101.3 docs/s 2009-01-25 17:48:52.106 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 0.44708738% 2315 docs, 0 errors, 100.7 docs/s 2009-01-25 17:48:55.108 INFO IPC Server handler 1 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 0.5117659% 2658 docs, 0 errors, 102.2 docs/s 2009-01-25 17:48:58.109 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 0.5795401% 3019 docs, 0 errors, 104.1 docs/s 2009-01-25 17:49:01.111 INFO IPC Server handler 1 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 0.6468777% 3377 docs, 0 errors, 105.5 docs/s 2009-01-25 17:49:04.112 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 0.71514136% 3741 docs, 0 errors, 106.9 docs/s 2009-01-25 17:49:07.114 INFO IPC Server handler 1 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 0.78037554% 4094 docs, 0 errors, 107.7 docs/s 2009-01-25 17:49:10.115 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 0.84660184% 4448 docs, 0 errors, 108.5 docs/s 2009-01-25 17:49:13.117 INFO IPC Server handler 1 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 0.9146803% 4810 docs, 0 errors, 109.3 docs/s 2009-01-25 17:49:16.118 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 0.9521195% 5044 docs, 0 errors, 107.3 docs/s 2009-01-25 17:49:18.626 INFO IPC Server handler 1 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 1.0% 5364 docs, 0 errors, 107.3 docs/s 2009-01-25 17:49:18.631 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - Task attempt_200901251629_0012_m_000004_0 is done. 2009-01-25 17:49:18.631 INFO main org.apache.hadoop.mapred.TaskRunner - Task 'attempt_200901251629_0012_m_000004_0' done. 2009-01-25 17:49:19.120 INFO IPC Server handler 1 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 1.0% 5364 docs, 0 errors, 107.3 docs/s 2009-01-25 17:49:19.120 INFO IPC Server handler 1 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_000004_0 Ignoring status-update since task is 'done' 2009-01-25 17:49:35.582 INFO taskCleanup org.apache.hadoop.mapred.TaskTracker - Received 'KillJobAction' for job: job_200901251629_0012 2009-01-25 17:49:35.582 INFO taskCleanup org.apache.hadoop.mapred.TaskRunner - attempt_200901251629_0012_m_000004_0 done; removing files. # Still processes left even though the TaskTracker said: Received 'KillJobAction' for job: job_200901251629_0012 root@mapreduce2:~# jps 10732 Jps 10634 TaskTracker$Child 8660 DataNode 8824 TaskTracker 8730 SecondaryNameNode 25060 Bootstrap root@mapreduce2:~# date Sun Jan 25 17:51:48 CET 2009 root@mapreduce2:~# On Sun, Jan 25, 2009 at 5:42 PM, Marcus Herou wrote: > Hi. > > Today I noticed when I ran a Solr Indexing job through our Hadoop cluster > that the master MySQL database where screaming about "Too Many Connections". > > I wondered how that could happen so I logged into my Hadoop machines and > searched through the logs. Nothing strange there. Then I just did a jps: > > root@mapreduce1:~# jps > 10701 TaskTracker$Child > 9567 NameNode > 5435 TaskTracker$Child > 31801 Bootstrap > 7349 TaskTracker$Child > 6197 TaskTracker$Child > 7761 TaskTracker$Child > 10453 TaskTracker$Child > 11232 TaskTracker$Child > 11113 TaskTracker$Child > 9688 DataNode > 10877 TaskTracker$Child > 6504 TaskTracker$Child > 10236 TaskTracker$Child > 9852 TaskTracker > 6515 TaskTracker$Child > 11396 TaskTracker$Child > 11741 Jps > 6191 TaskTracker$Child > 10981 TaskTracker$Child > 7742 TaskTracker$Child > 5946 TaskTracker$Child > 11315 TaskTracker$Child > 8112 TaskTracker$Child > 11580 TaskTracker$Child > 11490 TaskTracker$Child > 5687 TaskTracker$Child > 5927 TaskTracker$Child > 27144 WrapperSimpleApp > 7368 TaskTracker$Child > > Damn! Each Child have it's own DataSource (dbcp pool) tweaked down so it > only can have one active connection to any shard at any time. > Background: I ran out of connections during the Christmas holidays since I > have 60 shards (10 per MySQL machine) and each required a DB-Pool which > allowed too many active+idle connections. > > Anyway I have no active jobs at the moment so the children should have died > by themselves. > Fortunately I have a little nice script which kills the bastards: jps > |egrep "TaskTracker.+" | awk '{print $1}'|xargs kill > I will probably put that in a cronjob which kills long running children... > > Anyway, how can this happen ? Am I doing something really stupid along the > way ? > Hard facts: > Ubuntu Hardy-Heron, 2.6.24-19-server > java version "1.6.0_06" > Hadoop-0.18.2 > It's my own classes which fires the jobs through JobClient > (JobClient.runJob(job)) > I feed the jar to hadoop by issuing: job.setJar(jarFile); (comes from a > bash script) > I feed deps into hadoop by issuing: job.set("tmpjars", jarFiles); (comes by > parsing external CLASSPATH ENV in bash) > > The client do not complain, se example output below (I write no data to > HDFS ((HDFS bytes written=774)), since I mostly use it for crawling and all > crawlers/indexers access my sharding db structure directly without > intermediate storage): > 2009-01-25 17:12:11.175 INFO main org.apache.hadoop.mapred.FileInputFormat > - Total input paths to process : 1 > 2009-01-25 17:12:11.176 INFO main org.apache.hadoop.mapred.FileInputFormat > - Total input paths to process : 1 > 2009-01-25 17:12:11.437 INFO main org.apache.hadoop.mapred.JobClient - > Running job: job_200901251629_0011 > 2009-01-25 17:12:12.439 INFO main org.apache.hadoop.mapred.JobClient - map > 0% reduce 0% > 2009-01-25 17:12:35.481 INFO main org.apache.hadoop.mapred.JobClient - map > 6% reduce 0% > 2009-01-25 17:12:40.493 INFO main org.apache.hadoop.mapred.JobClient - map > 21% reduce 0% > 2009-01-25 17:12:45.502 INFO main org.apache.hadoop.mapred.JobClient - map > 31% reduce 0% > 2009-01-25 17:12:50.511 INFO main org.apache.hadoop.mapred.JobClient - map > 51% reduce 0% > 2009-01-25 17:12:55.520 INFO main org.apache.hadoop.mapred.JobClient - map > 67% reduce 0% > 2009-01-25 17:13:00.533 INFO main org.apache.hadoop.mapred.JobClient - map > 72% reduce 0% > 2009-01-25 17:13:05.543 INFO main org.apache.hadoop.mapred.JobClient - map > 84% reduce 0% > 2009-01-25 17:13:10.552 INFO main org.apache.hadoop.mapred.JobClient - map > 95% reduce 0% > 2009-01-25 17:13:15.560 INFO main org.apache.hadoop.mapred.JobClient - map > 98% reduce 0% > 2009-01-25 17:13:20.568 INFO main org.apache.hadoop.mapred.JobClient - Job > complete: job_200901251629_0011 > 2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient - > Counters: 7 > 2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient - > File Systems > 2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient - > HDFS bytes read=2741143 > 2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient - > HDFS bytes written=774 > 2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient - > Job Counters > 2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient - > Rack-local map tasks=9 > 2009-01-25 17:13:20.571 INFO main org.apache.hadoop.mapred.JobClient - > Launched map tasks=9 > 2009-01-25 17:13:20.571 INFO main org.apache.hadoop.mapred.JobClient - > Map-Reduce Framework > 2009-01-25 17:13:20.571 INFO main org.apache.hadoop.mapred.JobClient - > Map input records=48314 > 2009-01-25 17:13:20.571 INFO main org.apache.hadoop.mapred.JobClient - > Map input bytes=2732424 > 2009-01-25 17:13:20.571 INFO main org.apache.hadoop.mapred.JobClient - > Map output records=0 > > Any suggestions or pointers would be greatly appreciated. Hmm Coming to > think about something. I start X threads from inside Hadoop almost cut'n > pasted from Nutch. > If a thread somehow would linger, would Hadoop not be able to shutdown even > though there is nothing more to read from the RecordReader ? > > Kindly > > //Marcus > > -- > Marcus Herou CTO and co-founder Tailsweep AB > +46702561312 > marcus.herou@tailsweep.com > http://www.tailsweep.com/ > http://blogg.tailsweep.com/ > -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.herou@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/ --000e0cd6ad38b60a990461517636--