hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dapeng Sun (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-6211) Thread leak of JobTracker when using OOZIE to submit a job
Date Wed, 09 Apr 2014 05:11:15 GMT
Dapeng Sun created HDFS-6211:

             Summary: Thread leak of JobTracker when using OOZIE to submit a job
                 Key: HDFS-6211
                 URL: https://issues.apache.org/jira/browse/HDFS-6211
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs-client
    Affects Versions: 1.1.2
            Reporter: Dapeng Sun

Scene:  When use OOZIE to run pig script, if the fs.default.name is hostname, like FQDN, but
OOZIE spec an IP address in its configuration , the JobTracker will thread leak after many
jobs submitted.

I investigated the issue: In JobTracker, it will use a CACHE to cache DFSClient in FileSystem,
the CACHE is a Map<Key, FileSystem>, the Key of the Cache has three members: scheme,
server-host, and (UserGroupInfomation)ugi,  and when client request an instances, it will
get from cache first.

The issue is jobtracker is crashed by thread leak after many jobs submitted through OOZIE
and the leaked thread is LeaseChecker. it was created by DFSClient and thread will run till
the DFSClient was closed, FileSystem is an abstract class, and his Implementation DistributedFileSystem
has a member DFSClient, so if a cached DFSClient isn't close, it will cause thread leak.

JobClient will generated and upload the related files likes the properties file "Job_XXX.xml"
to HDFS, in normal cluster, JobClient will read the properties of core-site.xml, the "fs.default.name"
is same usually, but OOZIE is special, it’s a workflow engine, it use the same class of
Hadoop configuration, but OOZIE didn’t read Hadoop configuration. So every jobs must specify
HDFS URI and Job Tracker (Resource Manager)’s address. It will put a property named ”fs.default.name”
to job.xml

When Jobtracker initialize a job to run, it will read the JobConfiguration from Job_XXX.xml,
the property "fs.default.name" in core-site.xml was overrided by JobConf, if the code, which
is related to JobConf, get DFSClient from the CACHE, the properties will be changed, likes
hostname changed to IP address. it will use an different key to get from the Cache, most of
the cached DFSClient would be closed by CleanupQueue or close directly, the changed key can't
be closed, so it cause the thread leak.

1.Make the property is final, after add an attribute ”final” to "fs.default.name", it
would not be override at any time
After make the property is final, no subsequent load could change it, it will affect all the
cluster, no sure if other components had no requirement to change it.
2.Transform hostname to IP-address of the Key in JobTracker.
It need a patch to HDFS, We should transform it before create the key.

Here is the example scripts to reproduce the issue:
A = load '/user/root/oozietest/input.data' USING PigStorage(',') AS (c1:int, c2:chararray);
B= group A by c2;
STORE C into '/user/root/oozietest/output' using PigStorage (';');


<?xml version="1.0"?>
<workflow-app xmlns="uri:oozie:workflow:0.1" name="OoziepigTest">
  <start to="Step1"/>
  <action name="Step1">
    <ok to="end"/>
    <error to="end"/>
  <end name="end"/>


This message was sent by Atlassian JIRA

View raw message