hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walter Su (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)
Date Tue, 15 Sep 2015 09:05:46 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745093#comment-14745093

Walter Su commented on HDFS-9040:

2. If a streamer fails immediately after you add it to healthySet. Is the below code have
endless wait()? Maybe we could recalculate healthySet? and a timeout wait? (Race condition
between streamer and main-thread)
  private List<StripedDataStreamer> waitCreatingNewStreams(
      Set<StripedDataStreamer> healthyStreamers) throws IOException {
    final int expectedNum = healthyStreamers.size();
    synchronized (coordinator) {
      while (coordinator.updateStreamerMap.size() != expectedNum) {
        try {

3.again an issue about last stripe. (Race condition between streamer and main-thread). Once
you trust a streamer is healthy, you wait endlessly, the streamer fails and betrays you. Maybe
a timeout wait?
  private void allocateNewBlock() throws IOException {
    if (currentBlockGroup != null) {
      for (int i = 0; i < numAllBlocks; i++) {
        if (getStripedDataStreamer(i).isHealthy()) {
          // sync all the healthy streamers before writing to the new block
          final ExtendedBlock b = coordinator.takeEndBlock(i);

4.(Race condition between streamer and main-thread) You trust it's a healthy streamer. Then
it fails immediately. You setExternalError. Does {{internalError}} get cleared by mistake?
  private Set<StripedDataStreamer> markExternalErrorOnStreamers() {
    Set<StripedDataStreamer> healthySet = new HashSet<>();
    for (StripedDataStreamer streamer : streamers) {
      if (streamer.isHealthy() &&
          streamer.getStage() == BlockConstructionStage.DATA_STREAMING) {

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)
> -------------------------------------------------------------------------------------------
>                 Key: HDFS-9040
>                 URL: https://issues.apache.org/jira/browse/HDFS-9040
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Walter Su
>         Attachments: HDFS-9040-HDFS-7285.002.patch, HDFS-9040.00.patch, HDFS-9040.001.wip.patch,
> The general idea is to simplify error handling logic.
> Proposal 1:
> A BlockGroupDataStreamer to communicate with NN to allocate/update block, and StripedDataStreamer
s only have to stream blocks to DNs.
> Proposal 2:
> See below the [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
from [~jingzhao].

This message was sent by Atlassian JIRA

View raw message