hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prabhu Joseph (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-6797) Improvement in the fix of Mapreduce-6684
Date Wed, 19 Oct 2016 11:40:58 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Prabhu Joseph updated MAPREDUCE-6797:
-------------------------------------
    Description: 
There is one more piece of code in HistoryFileManager where Synchronized keyword on HistoryFileInfo
need to be removed. The JobHistoryServer contention issue is hit on our environment where
stacktrace (attached) shows the HistoryFileManager$JobListCache.addIfAbsent unnecessarily
waiting to lock on HistoryFileInfo.

Synchronized on isMovePending and didMoveFail has been removed by Mapreduce-6684.

{code}
HistoryFileInfo firstValue = cache.get(key);
    synchronized(firstValue) {  ---------------> Synchronized is not needed here
              if (firstValue.isMovePending()) {
                if(firstValue.didMoveFail() && 
                    firstValue.jobIndexInfo.getFinishTime() <= cutoff) {
                  cache.remove(key);
                  //Now lets try to delete it
                  try {
                    firstValue.delete();
                  } catch (IOException e) {
                    LOG.error("Error while trying to delete history files" +
                    " that could not be moved to done.", e);
                  }
                } else {
                  LOG.warn("Waiting to remove " + key
                      + " from JobListCache because it is not in done yet.");
                }
              } else {
                cache.remove(key);
              }
            }

{code}


{code}

Note: stacktrace is from hadoop-2.4.0 version and the problem exists in latest hadoop as well

"2144820863@qtp-313351300-38156" daemon prio=10 tid=0x0000000001e13800 nid=0xf133 waiting
for monitor entry [0x00007f7c1d8dd000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$JobListCache.addIfAbsent(HistoryFileManager.java:226)
        - waiting to lock <0x000000040145c4d8> (a org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:825)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$200(HistoryFileManager.java:82)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir.scanIfNeeded(HistoryFileManager.java:280)
        - locked <0x0000000400375388> (a org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:792)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getAllFileInfo(HistoryFileManager.java:920)
        at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:156)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getAllJobs(JobHistory.java:235)
{code}

  was:
Description:

There is one more piece of code in HistoryFileManager where Synchronized keyword on HistoryFileInfo
need to be removed. The JobHistoryServer contention issue is hit on our environment where
stacktrace (attached) shows the HistoryFileManager$JobListCache.addIfAbsent unnecessarily
waiting to lock on HistoryFileInfo.

Synchronized on isMovePending and didMoveFail has been removed by Mapreduce-6684.

{code}
HistoryFileInfo firstValue = cache.get(key);
    synchronized(firstValue) {  ---------------> Synchronized is not needed here
              if (firstValue.isMovePending()) {
                if(firstValue.didMoveFail() && 
                    firstValue.jobIndexInfo.getFinishTime() <= cutoff) {
                  cache.remove(key);
                  //Now lets try to delete it
                  try {
                    firstValue.delete();
                  } catch (IOException e) {
                    LOG.error("Error while trying to delete history files" +
                    " that could not be moved to done.", e);
                  }
                } else {
                  LOG.warn("Waiting to remove " + key
                      + " from JobListCache because it is not in done yet.");
                }
              } else {
                cache.remove(key);
              }
            }

{code}


{code}

Note: stacktrace is from hadoop-2.4.0 version and the problem exists in latest hadoop as well

"2144820863@qtp-313351300-38156" daemon prio=10 tid=0x0000000001e13800 nid=0xf133 waiting
for monitor entry [0x00007f7c1d8dd000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$JobListCache.addIfAbsent(HistoryFileManager.java:226)
        - waiting to lock <0x000000040145c4d8> (a org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:825)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$200(HistoryFileManager.java:82)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir.scanIfNeeded(HistoryFileManager.java:280)
        - locked <0x0000000400375388> (a org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:792)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getAllFileInfo(HistoryFileManager.java:920)
        at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:156)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getAllJobs(JobHistory.java:235)
{code}


> Improvement in the fix of Mapreduce-6684
> ----------------------------------------
>
>                 Key: MAPREDUCE-6797
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6797
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 2.4.0, 2.8.0
>            Reporter: Prabhu Joseph
>            Priority: Critical
>
> There is one more piece of code in HistoryFileManager where Synchronized keyword on HistoryFileInfo
need to be removed. The JobHistoryServer contention issue is hit on our environment where
stacktrace (attached) shows the HistoryFileManager$JobListCache.addIfAbsent unnecessarily
waiting to lock on HistoryFileInfo.
> Synchronized on isMovePending and didMoveFail has been removed by Mapreduce-6684.
> {code}
> HistoryFileInfo firstValue = cache.get(key);
>     synchronized(firstValue) {  ---------------> Synchronized is not needed here
>               if (firstValue.isMovePending()) {
>                 if(firstValue.didMoveFail() && 
>                     firstValue.jobIndexInfo.getFinishTime() <= cutoff) {
>                   cache.remove(key);
>                   //Now lets try to delete it
>                   try {
>                     firstValue.delete();
>                   } catch (IOException e) {
>                     LOG.error("Error while trying to delete history files" +
>                     " that could not be moved to done.", e);
>                   }
>                 } else {
>                   LOG.warn("Waiting to remove " + key
>                       + " from JobListCache because it is not in done yet.");
>                 }
>               } else {
>                 cache.remove(key);
>               }
>             }
> {code}
> {code}
> Note: stacktrace is from hadoop-2.4.0 version and the problem exists in latest hadoop
as well
> "2144820863@qtp-313351300-38156" daemon prio=10 tid=0x0000000001e13800 nid=0xf133 waiting
for monitor entry [0x00007f7c1d8dd000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$JobListCache.addIfAbsent(HistoryFileManager.java:226)
>         - waiting to lock <0x000000040145c4d8> (a org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo)
>         at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:825)
>         at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$200(HistoryFileManager.java:82)
>         at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir.scanIfNeeded(HistoryFileManager.java:280)
>         - locked <0x0000000400375388> (a org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir)
>         at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:792)
>         at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getAllFileInfo(HistoryFileManager.java:920)
>         at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:156)
>         at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getAllJobs(JobHistory.java:235)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message