hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5436) job history directory grows without bound, locks up job tracker on new job submission
Date Sun, 08 Mar 2009 05:14:56 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679953#action_12679953
] 

Amar Kamat commented on HADOOP-5436:
------------------------------------

Tim,

bq. Further investigation showed the were 200,000+ files in the job history folder - and every
submission was creating a FileStatus for them all, then applying a regular expression to just
the name
Hey. I think the regex is passed in the DFS call and the expected answer is just  *one* FileStatus
object. I dont know how the regex based  search is implemented. But JobHistory doesnt create
FileStatus objects for all the files.

bq. having Hadoop default to storing all the history files in a single directory is a Bad
Idea
HADOOP-4670 is opened to address this. 

bq. doing expensive processing of every history file on every job submission is a Worse Idea
HADOOP-4372 should help as there will be no need to access history folder in job initialization.
But I think the DFS should be efficient enough for regex based searches. 

bq. doing expensive processing of every history file on every job submission while holding
a lock on the JobInProgress object and thereby blocking the jobtracker.jsp from rendering
is a Terrible Idea (note: haven't confirmed this, but a cursory glance suggests that's what's
going on)
The plan is to improve on JobTracker locking and improve on granularity. But I think HADOOP-4372
should eliminate this. 

bq. not being able to clean up the mess without taking down the job tracker is just Unfortunate
Look at HADOOP-4167.

> job history directory grows without bound, locks up job tracker on new job submission
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5436
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5436
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Tim Williamson
>
> An unpleasant surprise upgrading to 0.19: requests to jobtracker.jsp would take a long
time or even time out whenever new jobs where submitted.  Investigation showed the call to
JobInProgress.initTasks() was calling JobHistory.JobInfo.logSubmitted() which in turn was
calling JobHistory.getJobHistoryFileName() which was pegging the CPU for a couple minutes.
 Further investigation showed the were 200,000+ files in the job history folder -- and every
submission was creating a FileStatus for them all, then applying a regular expression to just
the name.  All this just on the off chance the job tracker had been restarted (see HADOOP-3245).
 To make matters worse, these files cannot be safely deleted while the job tracker is running,
as the disappearance of a history file at the wrong time causes a FileNotFoundException.
> So to summarize the issues:
> - having Hadoop default to storing all the history files in a single directory is a Bad
Idea
> - doing expensive processing of every history file on every job submission is a Worse
Idea
> - doing expensive processing of every history file on every job submission while holding
a lock on the JobInProgress object and thereby blocking the jobtracker.jsp from rendering
is a Terrible Idea (note: haven't confirmed this, but a cursory glance suggests that's what's
going on)
> - not being able to clean up the mess without taking down the job tracker is just Unfortunate

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message