hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yufei Gu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-6858) HistoryFileManager thrashing due to high volume jobs
Date Tue, 07 Mar 2017 19:14:38 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yufei Gu updated MAPREDUCE-6858:
--------------------------------
    Description: 
- JHS scans "done_intermediate" dir for files to process and adds them to a thread pool
- Thread pool starts processing these files to move them to "done" dir
- JHS scans "done_intermediate" again for files to process and adds them to a thread pool
-- If we have enough jobs where the thread pool can't keep up with the scanning interval,
they'll get added twice (or more). If this keeps compounding,  jobs end up would pile up and
not getting processed for quite some time and getting lots of FileNotFoundException's.

By default, it looks like the thread pool only has 3 threads in it (mapreduce.jobhistory.move.thread-count).
And the scan interval is 3 minutes (mapreduce.jobhistory.move.interval-ms). Perhaps we should
increase these?

  was:
- JHS scans "done_intermediate" dir for files to process and adds them to a thread pool
- Thread pool starts processing these files to move them to "done" dir
- JHS scans "done_intermediate" again for files to process and adds them to a thread pool
-- If we have enough jobs where the thread pool can't keep up with the scanning interval,
they'll get added twice (or more). If this keeps compounding, I wouldn't be surprised if jobs
end up piling up and not getting processed for quite some time and getting lots of FileNotFoundException's.

By default, it looks like the thread pool only has 3 threads in it (mapreduce.jobhistory.move.thread-count).
And the scan interval is 3 minutes (mapreduce.jobhistory.move.interval-ms). Perhaps we should
increase these?


> HistoryFileManager thrashing due to high volume jobs 
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-6858
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6858
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>            Reporter: Yufei Gu
>
> - JHS scans "done_intermediate" dir for files to process and adds them to a thread pool
> - Thread pool starts processing these files to move them to "done" dir
> - JHS scans "done_intermediate" again for files to process and adds them to a thread
pool
> -- If we have enough jobs where the thread pool can't keep up with the scanning interval,
they'll get added twice (or more). If this keeps compounding,  jobs end up would pile up and
not getting processed for quite some time and getting lots of FileNotFoundException's.
> By default, it looks like the thread pool only has 3 threads in it (mapreduce.jobhistory.move.thread-count).
And the scan interval is 3 minutes (mapreduce.jobhistory.move.interval-ms). Perhaps we should
increase these?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message