hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcus Herou <marcus.he...@tailsweep.com>
Subject Lingering TaskTracker$Child
Date Sun, 25 Jan 2009 16:42:26 GMT
Hi.

Today I noticed when I ran a Solr Indexing job through our Hadoop cluster
that the master MySQL database where screaming about "Too Many Connections".

I wondered how that could happen so I logged into my Hadoop machines and
searched through the logs. Nothing strange there. Then I just did a jps:

root@mapreduce1:~# jps
10701 TaskTracker$Child
9567 NameNode
5435 TaskTracker$Child
31801 Bootstrap
7349 TaskTracker$Child
6197 TaskTracker$Child
7761 TaskTracker$Child
10453 TaskTracker$Child
11232 TaskTracker$Child
11113 TaskTracker$Child
9688 DataNode
10877 TaskTracker$Child
6504 TaskTracker$Child
10236 TaskTracker$Child
9852 TaskTracker
6515 TaskTracker$Child
11396 TaskTracker$Child
11741 Jps
6191 TaskTracker$Child
10981 TaskTracker$Child
7742 TaskTracker$Child
5946 TaskTracker$Child
11315 TaskTracker$Child
8112 TaskTracker$Child
11580 TaskTracker$Child
11490 TaskTracker$Child
5687 TaskTracker$Child
5927 TaskTracker$Child
27144 WrapperSimpleApp
7368 TaskTracker$Child

Damn! Each Child have it's own DataSource (dbcp pool) tweaked down so it
only can have one active connection to any shard at any time.
Background: I ran out of connections during the Christmas holidays since I
have 60 shards (10 per MySQL machine) and each required a DB-Pool which
allowed too many active+idle connections.

Anyway I have no active jobs at the moment so the children should have died
by themselves.
Fortunately I have a little nice script which kills the bastards: jps |egrep
"TaskTracker.+" | awk '{print $1}'|xargs kill
I will probably put that in a cronjob which kills long running children...

Anyway, how can this happen ? Am I doing something really stupid along the
way ?
Hard facts:
Ubuntu Hardy-Heron, 2.6.24-19-server
java version "1.6.0_06"
Hadoop-0.18.2
It's my own classes which fires the jobs through JobClient
(JobClient.runJob(job))
I feed the jar to hadoop by issuing: job.setJar(jarFile); (comes from a bash
script)
I feed deps into hadoop by issuing: job.set("tmpjars", jarFiles); (comes by
parsing external CLASSPATH ENV in bash)

The client do not complain, se example output below (I write no data to HDFS
((HDFS bytes written=774)), since I mostly use it for crawling and all
crawlers/indexers access my sharding db structure directly without
intermediate storage):
2009-01-25 17:12:11.175 INFO main org.apache.hadoop.mapred.FileInputFormat -
Total input paths to process : 1
2009-01-25 17:12:11.176 INFO main org.apache.hadoop.mapred.FileInputFormat -
Total input paths to process : 1
2009-01-25 17:12:11.437 INFO main org.apache.hadoop.mapred.JobClient -
Running job: job_200901251629_0011
2009-01-25 17:12:12.439 INFO main org.apache.hadoop.mapred.JobClient -  map
0% reduce 0%
2009-01-25 17:12:35.481 INFO main org.apache.hadoop.mapred.JobClient -  map
6% reduce 0%
2009-01-25 17:12:40.493 INFO main org.apache.hadoop.mapred.JobClient -  map
21% reduce 0%
2009-01-25 17:12:45.502 INFO main org.apache.hadoop.mapred.JobClient -  map
31% reduce 0%
2009-01-25 17:12:50.511 INFO main org.apache.hadoop.mapred.JobClient -  map
51% reduce 0%
2009-01-25 17:12:55.520 INFO main org.apache.hadoop.mapred.JobClient -  map
67% reduce 0%
2009-01-25 17:13:00.533 INFO main org.apache.hadoop.mapred.JobClient -  map
72% reduce 0%
2009-01-25 17:13:05.543 INFO main org.apache.hadoop.mapred.JobClient -  map
84% reduce 0%
2009-01-25 17:13:10.552 INFO main org.apache.hadoop.mapred.JobClient -  map
95% reduce 0%
2009-01-25 17:13:15.560 INFO main org.apache.hadoop.mapred.JobClient -  map
98% reduce 0%
2009-01-25 17:13:20.568 INFO main org.apache.hadoop.mapred.JobClient - Job
complete: job_200901251629_0011
2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient -
Counters: 7
2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient -
File Systems
2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient -
HDFS bytes read=2741143
2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient -
HDFS bytes written=774
2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient -   Job
Counters
2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient -
Rack-local map tasks=9
2009-01-25 17:13:20.571 INFO main org.apache.hadoop.mapred.JobClient -
Launched map tasks=9
2009-01-25 17:13:20.571 INFO main org.apache.hadoop.mapred.JobClient -
Map-Reduce Framework
2009-01-25 17:13:20.571 INFO main org.apache.hadoop.mapred.JobClient -
Map input records=48314
2009-01-25 17:13:20.571 INFO main org.apache.hadoop.mapred.JobClient -
Map input bytes=2732424
2009-01-25 17:13:20.571 INFO main org.apache.hadoop.mapred.JobClient -
Map output records=0

Any suggestions or pointers would be greatly appreciated. Hmm Coming to
think about something. I start X threads from inside Hadoop almost cut'n
pasted from Nutch.
If a thread somehow would linger, would Hadoop not be able to shutdown even
though there is nothing more to read from the RecordReader ?

Kindly

//Marcus

-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.herou@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message