hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaoyu Yao (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HDDS-2010) PipelineID management for multi-raft, in SCM or in datanode?
Date Thu, 29 Aug 2019 03:06:00 GMT

    [ https://issues.apache.org/jira/browse/HDDS-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918239#comment-16918239

Xiaoyu Yao commented on HDDS-2010:

I would prefer 1 for better scalability. Also SCM always has its in-memory pipeline map built
based on the pipeline report from DNs.


We also need another Jira that change the current pipeline creation logic:

Currently, SCM directly talk to DN to create pipeline, assuming there are pending Read/Write
need to use the pipeline follows. 

We should change to have the pipeline creation/destroy into DN heartbeat response model. This
way we have better SCM scalability. 


cc: [~anu].

> PipelineID management for multi-raft, in SCM or in datanode?
> ------------------------------------------------------------
>                 Key: HDDS-2010
>                 URL: https://issues.apache.org/jira/browse/HDDS-2010
>             Project: Hadoop Distributed Data Store
>          Issue Type: New Feature
>          Components: Ozone Datanode
>            Reporter: Li Cheng
>            Assignee: Li Cheng
>            Priority: Major
>             Fix For: 0.5.0
> With the intention to support multi-raft, I wanna bring up a question on how the pipeline
unique ids be managed. Since every datanode  can be member in multiple raft pipelines, the
pipeline ids need to be persisted with the datanode for recovery purpose (we can talk about
recovery later). Generally there are two options:
>  # Store in datanode (like datanodeDetails) and every time pipelines mapping change on
single datanode, pipeline ids will be serialized to local file. This way will lead to many
more local serialization of things like datanodeDetails, but the updates are only for local
datanode change. Improvement can be made like linking a serializable object to datanodeDetails
and datanode keeps updating the new pipeline ids to the serializable object instead the details
file. On the other hand, since the pipeline ids are stored only in datanode locally, there
will be no global view in SCM. (or we can store a lazy copy?)
>  # Stored in SCM. SCM can maintain a large mapping between datanode ids and pipeline
ids. But this way will lead to an exponentially increasing frequency in SCM updates since
the pipeline mapping changes are way more complex and happen all the time. Obviously this
gives SCM too much pressure, but it can also give SCM a global view on the management over
datanodes and multi raft pipelines. 
> Thoughts? [~xyao] [~Sammi] 

This message was sent by Atlassian Jira

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message