hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nanda kumar (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDDS-726) Ozone Client should update SCM to move the container out of allocation path in case a write transaction fails
Date Wed, 27 Feb 2019 17:37:01 GMT

    [ https://issues.apache.org/jira/browse/HDDS-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16779554#comment-16779554
] 

Nanda kumar edited comment on HDDS-726 at 2/27/19 5:36 PM:
-----------------------------------------------------------

[~shashikant], I'm still going through the patch. Please find my initial comments below

Looks like with this patch we will lose the performance optimization done in HDDS-1106.
In {{PipelineStateMap}} instead of iterating over the complete {{pipelineMap}} we can call
{{PipelineStateMap#getPipelines(type, factor, state)}} and apply exclude filter on the result.
----
 

Suggestion: In {{ExcludeList}} class it would be more intuitive to use {{List<DatanodeDetails>}}
and {{List<ContainerID>}} in place of {{List<UUID>}} and {{List<Long>}}
respectively.
----
 

In {{BlockManagerImpl#allocateBlock}} we will be stuck forever inside the while loop in scenarios
like below.
{noformat}
SCM State:

Pipelines: {
  P1 [dn1, dn2, dn3], 
  P2 [dn4, dn5, dn6], 
  P3 [dn7, dn8, dn9]
}

Containers: {
	C1 [P1, available space: 500MB],
	C2 [P1, available space: 500MB],
	C3 [P1, available space: 500MB],
	C4 [P2, available space: 500MB],
	C5 [P2, available space: 500MB],
	C6 [P2, available space: 500MB],
	C7 [P3, available space: 500MB],
	C8 [P3, available space: 500MB],
	C9 [P3, available space: 10MB]
}

Client:
allocateBlock {
  size: 20MB,
  replicationType: RATIS,
  repicationFactor: THREE,
  owner: XXXX,
  excludeList: {
    datanodes: [dn1, dn3],
    containerIds: [C7, C8],   
    pipelineIds: [P2]
  }
}
{noformat}
Here we will exclude pipeline P1 and P2 while choosing the pipeline.
Once we pick a pipeline we try to allocate the block in a container from that pipeline in
a round-robin fashion, we do consider the space available in the container for block allocation.

In this scenarios \{{containerManager.getMatchingContainer(size, owner, pipeline)}} call
will either return C7 or C8 as those are the container which has space, not C9. We also will
not create any new container on this pipeline as we already have 3 containers in here.

In {{BlockManagerImpl#allocateBlock:Line 207 - 209}} we will exclude C7 or C8 whichever is
picked in that iteration (client asked us to exclude both C7 and C8).

In this case, the while loop will continue forever.
----
 

{{BlockManagerImpl:224}}:

We can refactor

{{return containers.parallelStream().anyMatch(predicate) ? true : false}} 

to

{{return containers.parallelStream().anyMatch(predicate)}}


was (Author: nandakumar131):
[~shashikant], I'm still going through the patch. Please find my initial comments below

Looks like with this patch we will lose the performance optimization done in HDDS-1106.
In {{PipelineStateMap}} instead of iterating over the complete {{pipelineMap}} we can call
{{PipelineStateMap#getPipelines(type, factor, state)}} and apply exclude filter on the result.
----
 

Suggestion: In {{ExcludeList}} class it would be more intuitive to use {{List<DatanodeDetails>}}
and {{List<ContainerID>}} in place of {{List<UUID>}} and {{List<Long>}}
respectively.
----
 

In {{BlockManagerImpl#allocateBlock}} we will be stuck forever inside the while loop in scenarios
like below.
{noformat}
SCM State:

Pipelines: {
  P1 [dn1, dn2, dn3], 
  P2 [dn4, dn5, dn6], 
  P3 [dn7, dn8, dn9]
}

Containers: {
	C1 [P1, available space: 500MB],
	C2 [P1, available space: 500MB],
	C3 [P1, available space: 500MB],
	C4 [P2, available space: 500MB],
	C5 [P2, available space: 500MB],
	C6 [P2, available space: 500MB],
	C7 [P3, available space: 500MB],
	C8 [P3, available space: 500MB],
	C9 [P3, available space: 10MB]
}

Client:
allocateBlock {
  size: 20MB,
  replicationType: RATIS,
  repicationFactor: THREE,
  owner: XXXX,
  excludeList: {
    datanodes: [dn1, dn3],
    containerIds: [C7, C8],   
    pipelineIds: [P2]
  }
}
{noformat}
Here we will exclude pipeline P1 and P2 while choosing the pipeline.
Once we pick a pipeline we try to allocate the block in a container from that pipeline in
a round-robin fashion, we do consider the space available in the container for block allocation.

In this scenarios {{containerManager.getMatchingContainer(size, owner, pipeline) }}call
will either return C7 or C8 as those are the container which has space, not C9. We also will
not create any new container on this pipeline as we already have 3 containers in here.

In {{BlockManagerImpl#allocateBlock:Line 207 - 209}} we will exclude C7 or C8 whichever is
picked in that iteration (client asked us to exclude both C7 and C8).

In this case, the while loop will continue forever.
----
 

{{BlockManagerImpl:224}}:

We can refactor

{{return containers.parallelStream().anyMatch(predicate) ? true : false}} 

to

{{return containers.parallelStream().anyMatch(predicate)}}

> Ozone Client should update SCM to move the container out of allocation path in case a
write transaction fails
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-726
>                 URL: https://issues.apache.org/jira/browse/HDDS-726
>             Project: Hadoop Distributed Data Store
>          Issue Type: Test
>            Reporter: Shashikant Banerjee
>            Assignee: Shashikant Banerjee
>            Priority: Major
>         Attachments: HDDS-726.000.patch, HDDS-726.001.patch, HDDS-726.002.patch, HDDS-726.003.patch,
HDDS-726.004.patch, HDDS-726.005.patch, HDDS-726.006.patch, HDDS-726.007.patch, HDDS-726.008.patch
>
>
> Once an container write transaction fails, it will be marked corrupted. Once Ozone client
gets an exception in such case it should tell SCM to move the container out of allocation
path. SCM will eventually close the container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message