flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-8790) Improve performance for recovery from incremental checkpoint
Date Thu, 31 May 2018 11:59:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496419#comment-16496419
] 

ASF GitHub Bot commented on FLINK-8790:
---------------------------------------

Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5582#discussion_r192074489
  
    --- Diff: flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java
---
    @@ -791,20 +784,215 @@ private void restoreInstance(
     		}
     
     		/**
    -		 * Recovery from local incremental state.
    +		 * Recovery from multi incremental states.
    +		 * In case of rescaling, this method creates a temporary RocksDB instance for a key-groups
shard. All contents
    +		 * from the temporary instance are copied into the real restore instance and then the
temporary instance is
    +		 * discarded.
     		 */
    -		private void restoreInstance(IncrementalLocalKeyedStateHandle localKeyedStateHandle)
throws Exception {
    +		void restoreFromMultiHandles(Collection<KeyedStateHandle> restoreStateHandles)
throws Exception {
    +
    +			KeyGroupRange targetKeyGroupRange = stateBackend.keyGroupRange;
    +
    +			chooseTheBestStateHandleToInit(restoreStateHandles, targetKeyGroupRange);
    +
    +			int targetStartKeyGroup = stateBackend.getKeyGroupRange().getStartKeyGroup();
    +			byte[] targetStartKeyGroupPrefixBytes = new byte[stateBackend.keyGroupPrefixBytes];
    +			for (int j = 0; j < stateBackend.keyGroupPrefixBytes; ++j) {
    +				targetStartKeyGroupPrefixBytes[j] = (byte) (targetStartKeyGroup >>> ((stateBackend.keyGroupPrefixBytes
- j - 1) * Byte.SIZE));
    +			}
    +
    +			for (KeyedStateHandle rawStateHandle : restoreStateHandles) {
    +
    +				if (!(rawStateHandle instanceof IncrementalKeyedStateHandle)) {
    +					throw new IllegalStateException("Unexpected state handle type, " +
    +						"expected " + IncrementalKeyedStateHandle.class +
    +						", but found " + rawStateHandle.getClass());
    +				}
    +
    +				Path temporaryRestoreInstancePath = new Path(stateBackend.instanceBasePath.getAbsolutePath()
+ UUID.randomUUID().toString());
    +				try (RestoredDBInfo tmpRestoreDBInfo = restoreDBFromStateHandle(
    +						(IncrementalKeyedStateHandle) rawStateHandle,
    +						temporaryRestoreInstancePath,
    +						targetKeyGroupRange,
    +						stateBackend.keyGroupPrefixBytes)) {
    +
    +					List<ColumnFamilyDescriptor> tmpColumnFamilyDescriptors = tmpRestoreDBInfo.columnFamilyDescriptors;
    +					List<ColumnFamilyHandle> tmpColumnFamilyHandles = tmpRestoreDBInfo.columnFamilyHandles;
    +
    +					// iterating only the requested descriptors automatically skips the default column
family handle
    +					for (int i = 0; i < tmpColumnFamilyDescriptors.size(); ++i) {
    +						ColumnFamilyHandle tmpColumnFamilyHandle = tmpColumnFamilyHandles.get(i);
    +						ColumnFamilyDescriptor tmpColumnFamilyDescriptor = tmpColumnFamilyDescriptors.get(i);
    +
    +						ColumnFamilyHandle targetColumnFamilyHandle = getOrRegisterColumnFamilyHandle(
    +							tmpColumnFamilyDescriptor, null, tmpRestoreDBInfo.stateMetaInfoSnapshots.get(i));
    +
    +						try (RocksIterator iterator = tmpRestoreDBInfo.db.newIterator(tmpColumnFamilyHandle))
{
    +
    +							iterator.seek(targetStartKeyGroupPrefixBytes);
    +							while (iterator.isValid()) {
    +								// DB has been clip by target key group range, so we do not need to do key rang
check in this loop
    +								stateBackend.db.put(targetColumnFamilyHandle, iterator.key(), iterator.value());
    +								iterator.next();
    +							}
    +						} // releases native iterator resources
    +					}
    +				} finally {
    +					FileSystem restoreFileSystem = temporaryRestoreInstancePath.getFileSystem();
    +					if (restoreFileSystem.exists(temporaryRestoreInstancePath)) {
    +						restoreFileSystem.delete(temporaryRestoreInstancePath, true);
    +					}
    +				}
    +			}
    +		}
    +
    +		private class RestoredDBInfo implements AutoCloseable {
    --- End diff --
    
    This name is not optimal because the class is more than pure info. It holds the temporary
DB and is used to manage it's lifecycle. I would suggest `RestoreDBInstance` or `RestoreDBHandle`.


> Improve performance for recovery from incremental checkpoint
> ------------------------------------------------------------
>
>                 Key: FLINK-8790
>                 URL: https://issues.apache.org/jira/browse/FLINK-8790
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.5.0
>            Reporter: Sihua Zhou
>            Assignee: Sihua Zhou
>            Priority: Major
>             Fix For: 1.6.0
>
>
> When there are multi state handle to be restored, we can improve the performance as follow:
> 1. Choose the best state handle to init the target db
> 2. Use the other state handles to create temp db, and clip the db according to the target
key group range (via rocksdb.deleteRange()), this can help use get rid of the `key group check`
in 
>  `data insertion loop` and also help us get rid of traversing the useless record.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message