hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elek, Marton (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDDS-296) OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan
Date Thu, 26 Jul 2018 08:17:00 GMT

     [ https://issues.apache.org/jira/browse/HDDS-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Elek, Marton updated HDDS-296:
------------------------------
    Attachment: local.png

> OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan
> -----------------------------------------------------------------------------
>
>                 Key: HDDS-296
>                 URL: https://issues.apache.org/jira/browse/HDDS-296
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Elek, Marton
>            Priority: Critical
>             Fix For: 0.2.1
>
>         Attachments: local.png
>
>
> We identified the problem during freon tests on real clusters. First I saw it on a kubernetes
based pseudo cluster (50 datanode, 1 freon). After a while the rate of the key allocation
was slowed down. (See the attached image).
> I could also reproduce the problem with local cluster (I used the hadoop-dist/target/compose/ozoneperf
setup). After the first 1 million keys the key creation is almost stopped.
> With the help of [~nandakumar131] we identified the problem is the lock in the ozone
manager. (We profiled the OM with visual vm and found that the code is locked for an extremity
long time, also checked the rocksdb/rpc metrics from prometheus and everything else was worked
well.
> [~nandakumar131] suggested to use Instrumented lock in the OMMetadataManager. With a
custom build we identified that the problem is that the deletion service holds the OMMetadataManager
lock for a full range scan. For 1 million keys it took about 10 seconds (with my local developer
machine + ssd)
> {code}
> ozoneManager_1  | 2018-07-25 12:45:03 WARN  OMMetadataManager:143 - Lock held time above
threshold: lock identifier: OMMetadataManagerLock lockHeldTimeMs=2648 ms. Suppressed 0 lock
warnings. The stack trace is: java.lang.Thread.getStackTrace(Thread.java:1559)
> ozoneManager_1  | org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> ozoneManager_1  | org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> ozoneManager_1  | org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> ozoneManager_1  | org.apache.hadoop.util.InstrumentedReadLock.unlock(InstrumentedReadLock.java:78)
> ozoneManager_1  | org.apache.hadoop.ozone.om.KeyManagerImpl.getPendingDeletionKeys(KeyManagerImpl.java:506)
> ozoneManager_1  | org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:98)
> ozoneManager_1  | org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:85)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> ozoneManager_1  | java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> ozoneManager_1  | java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ozoneManager_1  | java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ozoneManager_1  | java.lang.Thread.run(Thread.java:748)
> {code}
> I checked it with disabled DeletionService and worked well.
> Deletion service should be improved to make it work without long term locking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message