flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-8360) Implement task-local state recovery
Date Wed, 14 Feb 2018 13:48:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364056#comment-16364056
] 

ASF GitHub Bot commented on FLINK-8360:
---------------------------------------

Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5239#discussion_r168177813
  
    --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/state/TaskExecutorLocalStateStoresManagerTest.java
---
    @@ -0,0 +1,122 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.runtime.state;
    +
    +import org.apache.flink.api.common.JobID;
    +import org.apache.flink.configuration.ConfigConstants;
    +import org.apache.flink.configuration.Configuration;
    +import org.apache.flink.runtime.clusterframework.types.ResourceID;
    +import org.apache.flink.runtime.jobgraph.JobVertexID;
    +import org.apache.flink.runtime.taskexecutor.TaskManagerServices;
    +import org.apache.flink.runtime.taskexecutor.TaskManagerServicesConfiguration;
    +
    +import org.junit.Assert;
    +import org.junit.Test;
    +import org.junit.rules.TemporaryFolder;
    +
    +import java.io.File;
    +import java.net.InetAddress;
    +
    +public class TaskExecutorLocalStateStoresManagerTest {
    +
    +	/**
    +	 * This tests that the creation of {@link TaskManagerServices} correctly creates the
local state root directory
    +	 * for the {@link TaskExecutorLocalStateStoresManager} with the configured root directory.
    +	 */
    +	@Test
    +	public void testCreationFromConfig() throws Exception {
    +
    +		final Configuration config = new Configuration();
    +
    +		final String rootDirString = "localStateRoot";
    +		config.setString(ConfigConstants.TASK_MANAGER_LOCAL_STATE_ROOT_DIR_KEY, rootDirString);
    +
    +		final ResourceID tmResourceID = ResourceID.generate();
    +
    +		TaskManagerServicesConfiguration taskManagerServicesConfiguration =
    +			TaskManagerServicesConfiguration.fromConfiguration(config, InetAddress.getLocalHost(),
true);
    +
    +		TaskManagerServices taskManagerServices =
    +			TaskManagerServices.fromConfiguration(taskManagerServicesConfiguration, tmResourceID);
    +
    +		TaskExecutorLocalStateStoresManager taskStateManager = taskManagerServices.getTaskStateManager();
    +
    +		Assert.assertEquals(
    +			new File(rootDirString, TaskManagerServices.LOCAL_STATE_SUB_DIRECTORY_ROOT),
    +			taskStateManager.getLocalStateRootDirectory());
    +
    +		Assert.assertEquals("localState", TaskManagerServices.LOCAL_STATE_SUB_DIRECTORY_ROOT);
    +	}
    +
    +	/**
    +	 * This tests that the creation of {@link TaskManagerServices} correctly falls back
to the first tmp directory of
    +	 * the IOManager as default for the local state root directory.
    +	 */
    +	@Test
    +	public void testCreationFromConfigDefault() throws Exception {
    +
    +		final Configuration config = new Configuration();
    +
    +		final ResourceID tmResourceID = ResourceID.generate();
    +
    +		TaskManagerServicesConfiguration taskManagerServicesConfiguration =
    +			TaskManagerServicesConfiguration.fromConfiguration(config, InetAddress.getLocalHost(),
true);
    +
    +		TaskManagerServices taskManagerServices =
    +			TaskManagerServices.fromConfiguration(taskManagerServicesConfiguration, tmResourceID);
    +
    +		TaskExecutorLocalStateStoresManager taskStateManager = taskManagerServices.getTaskStateManager();
    +
    +		Assert.assertEquals(
    +			new File(taskManagerServicesConfiguration.getTmpDirPaths()[0], TaskManagerServices.LOCAL_STATE_SUB_DIRECTORY_ROOT),
    +			taskStateManager.getLocalStateRootDirectory());
    +	}
    +
    +	/**
    +	 * This tests that the {@link TaskExecutorLocalStateStoresManager} creates {@link TaskLocalStateStore}
that have
    +	 * a properly initialized local state base directory.
    +	 */
    +	@Test
    +	public void testSubtaskStateStoreDirectoryCreation() throws Exception {
    +
    +		JobID jobID = new JobID();
    +		JobVertexID jobVertexID = new JobVertexID();
    +		int subtaskIdx = 42;
    +		TemporaryFolder tmp = new TemporaryFolder();
    --- End diff --
    
    👍 


> Implement task-local state recovery
> -----------------------------------
>
>                 Key: FLINK-8360
>                 URL: https://issues.apache.org/jira/browse/FLINK-8360
>             Project: Flink
>          Issue Type: New Feature
>          Components: State Backends, Checkpointing
>            Reporter: Stefan Richter
>            Assignee: Stefan Richter
>            Priority: Major
>             Fix For: 1.5.0
>
>
> This issue tracks the development of recovery from task-local state. The main idea is
to have a secondary, local copy of the checkpointed state, while there is still a primary
copy in DFS that we report to the checkpoint coordinator.
> Recovery can attempt to restore from the secondary local copy, if available, to save
network bandwidth. This requires that the assignment from tasks to slots is as sticky is possible.
> For starters, we will implement this feature for all managed keyed states and can easily
enhance it to all other state types (e.g. operator state) later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message