Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 21 Dec 2016 21:55:58 +0000 (UTC)
From: "James Clampffer (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.13029896.1482353958000.575817.1482357358548@Atlassian.JIRA>
In-Reply-To: <JIRA.13029896.1482353958000@Atlassian.JIRA>
References: <JIRA.13029896.1482353958000@Atlassian.JIRA> <JIRA.13029896.1482353958071@arcas>
Subject: [jira] [Updated] (HDFS-11266) libhdfs++: Redesign block reader with
 with simplicity and resource management in mind
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 21 Dec 2016 21:56:01 -0000


     [ https://issues.apache.org/jira/browse/HDFS-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Clampffer updated HDFS-11266:
-----------------------------------
    Description: 
The goal here is to significantly simplify the block reader and make it much harder to introduce issues.  There are plenty of examples of these issues in the subtasks of  HDFS-8707, the one that finally motivated a reimplementation is HDFS-10931.

Goals:
-The read side protocol of the data transfer pipeline is fundamentally really simple (even if done asynchronously).  The code should be equally simple.

-Get the code in a state that should be easy enough to reason about with a solid understanding of HDFS and basic understanding of C++ and vice versa: improve comments and avoid using esoteric C++ constructs.  This is a must-have in order to lower the bar to contribute.

-Get rid of dependencies on the existing continuation stuff.  Myself and others have spent far too much time debugging both the continuation code and bugs introduced because the continuation code was hard to reason about.  Notable issues:
  -It's cool from a theoretical perspective, but after 18 months of working on this it's still unclear what problem the continuation pattern helped solve that callbacks couldn't.
  -They spend more time allocating memory than the rest of the code does doing real work - seriously, profile it.  This can't be fixed because the Pipeline takes ownership of all Continuation objects and then deletes them.
  -The way the block reader really uses them is a hybrid of a state machine, continuations, and directly using asio callbacks to bounce between the two.

Proposed approach:
Still have a BlockReader class that owns a PacketReader class, the packet reader is analogous to the ReadPacketContinuation that the BlockReader builds now.  The difference is that none of this will be stitched together at runtime using continuations, and once we have a block reader with a member packet reader that gets allocated up front.  The PacketReader can be recycled in order to avoid allocations.  The block reader is only responsible for requesting block info, after that it keeps invoking the PacketReader until enough data has been read.

Async chaining:
Move to a state machine based approach.  This allows the readers to be pinned in memory, where each state is represented as a method.  The asynchronous IO becomes the state transitions.  A callback is supplied to the asio async call that jumps to the next state upon completion of the IO operation.  Epsilon transitions will be fairly rare, but if we need them to temporarily drop a lock as is done in the RPC code io_service::post can be used rather than a call that actually does IO.

I'm fairly confident in this approach since I used the same to implement various hardware async bus interfaces in VHDL to good effect i.e. high performance and easy to understand.  An asio callback is roughly analogous to a signal in a sensitivity list as the methods are to process blocks.

Example state machine that would send some stuff, then wait to get something back like what the current BlockReader::AsyncRequestBlock does using the approach described above.

{code}
class ExampleHandshake {
  // class would own any small buffers so they can be directly accessed
 public:
  void SendHandshake();
 private:
  void OnHandshakeSend();
  void OnHandShakeDone();

  asio::io_service service_;
  asio::ip::tcp::socket socket_;
}

void ExampleHandshake::SendHandshake() {
  // trampoline to jump into read state once write completes
  auto trampoline[this](asio::error_code ec, size_t sz) {
    //error checking here
   this->OnHandshakeSend();
  };
  asio::write(service_, socket_, asio buffer of data here, trampoline);
}

void ExampleHandshake::OnHandshakeSend() {
  // when read completes bounce into handler
  auto trampoline = [this](asio::error_code ec, size_t sz) {
    this->OnHandshakeDone();
  };
  asio::read(service_, socket_, asio buffer for received data, trampoline);
}

void ExampleHandshake::OnHandshakeDone() {
  //just finished sending request, and receiving response, go do something
}
{code}


  was:
The goal here is to significantly simplify the block reader and make it much harder to introduce issues.  There are plenty of examples of these issues in the subtasks of  HDFS-8707, the one that finally motivated a reimplementation is HDFS-10931.

Goals:
-The read side protocol of the data transfer pipeline is fundamentally really simple (even if done asynchronously).  The code should be equally simple.

-Get the code in a state that should be easy enough to reason about with a solid understanding of HDFS and basic understanding of C++ and vice versa: improve comments and avoid using esoteric C++ constructs.  This is a must-have in order to lower the bar to contribute.

-Get rid of dependencies on the existing continuation stuff.  Myself and others have spent far too much time debugging both the continuation code and bugs introduced because the continuation code was hard to reason about.  Notable issues:
  -It's cool from a theoretical perspective, but after 18 months of working on this it's still unclear what problem the continuation idiom helped solve.
  -They spend more time allocating memory than the rest of the code does doing real work - seriously, profile it.  This can't be fixed because the Pipeline takes ownership of all Continuation objects and then deletes them.
  -The way the block reader really uses them is a hybrid of a state machine, continuations, and directly using asio callbacks to bounce between the two.

Proposed approach:
Still have a BlockReader class that owns a PacketReader class, the packet reader is analogous to the ReadPacketContinuation that the BlockReader builds now.  The difference is that none of this will be stitched together at runtime using continuations, and once we have a block reader with a member packet reader that gets allocated up front.  The PacketReader can be recycled in order to avoid allocations.  The block reader is only responsible for requesting block info, after that it keeps invoking the PacketReader until enough data has been read.

Async chaining:
Move to a state machine based approach.  This allows the readers to be pinned in memory, where each state is represented as a method.  The asynchronous IO becomes the state transitions.  A callback is supplied to the asio async call that jumps to the next state upon completion of the IO operation.  Epsilon transitions will be fairly rare, but if we need them to temporarily drop a lock as is done in the RPC code io_service::post can be used rather than a call that actually does IO.

I'm fairly confident in this approach since I used the same to implement various async bus interfaces in VHDL to good effect i.e. high performance and easy to understand.  An asio callback is roughly analogous to a signal in a sensitivity list as the methods are to process blocks.

Example state machine that would send some stuff, then wait to get something back like what the current BlockReader::AsyncRequestBlock does using the approach described above.

{code}
class ExampleHandshake {
  // class would own any small buffers so they can be directly accessed
 public:
  void SendHandshake();
 private:
  void OnHandshakeSend();
  void OnHandShakeDone();

  asio::io_service service_;
  asio::ip::tcp::socket socket_;
}

void ExampleHandshake::SendHandshake() {
  // trampoline to jump into read state once write completes
  auto trampoline[this](asio::error_code ec, size_t sz) {
    //error checking here
   this->OnHandshakeSend();
  };
  asio::write(service_, socket_, asio buffer of data here, trampoline);
}

void ExampleHandshake::OnHandshakeSend() {
  // when read completes bounce into handler
  auto trampoline = [this](asio::error_code ec, size_t sz) {
    this->OnHandshakeDone();
  };
  asio::read(service_, socket_, asio buffer for received data, trampoline);
}

void ExampleHandshake::OnHandshakeDone() {
  //just finished sending request, and receiving response, go do something
}
{code}


> libhdfs++: Redesign block reader with with simplicity and resource management in mind
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-11266
>                 URL: https://issues.apache.org/jira/browse/HDFS-11266
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: James Clampffer
>
> The goal here is to significantly simplify the block reader and make it much harder to introduce issues.  There are plenty of examples of these issues in the subtasks of  HDFS-8707, the one that finally motivated a reimplementation is HDFS-10931.
> Goals:
> -The read side protocol of the data transfer pipeline is fundamentally really simple (even if done asynchronously).  The code should be equally simple.
> -Get the code in a state that should be easy enough to reason about with a solid understanding of HDFS and basic understanding of C++ and vice versa: improve comments and avoid using esoteric C++ constructs.  This is a must-have in order to lower the bar to contribute.
> -Get rid of dependencies on the existing continuation stuff.  Myself and others have spent far too much time debugging both the continuation code and bugs introduced because the continuation code was hard to reason about.  Notable issues:
>   -It's cool from a theoretical perspective, but after 18 months of working on this it's still unclear what problem the continuation pattern helped solve that callbacks couldn't.
>   -They spend more time allocating memory than the rest of the code does doing real work - seriously, profile it.  This can't be fixed because the Pipeline takes ownership of all Continuation objects and then deletes them.
>   -The way the block reader really uses them is a hybrid of a state machine, continuations, and directly using asio callbacks to bounce between the two.
> Proposed approach:
> Still have a BlockReader class that owns a PacketReader class, the packet reader is analogous to the ReadPacketContinuation that the BlockReader builds now.  The difference is that none of this will be stitched together at runtime using continuations, and once we have a block reader with a member packet reader that gets allocated up front.  The PacketReader can be recycled in order to avoid allocations.  The block reader is only responsible for requesting block info, after that it keeps invoking the PacketReader until enough data has been read.
> Async chaining:
> Move to a state machine based approach.  This allows the readers to be pinned in memory, where each state is represented as a method.  The asynchronous IO becomes the state transitions.  A callback is supplied to the asio async call that jumps to the next state upon completion of the IO operation.  Epsilon transitions will be fairly rare, but if we need them to temporarily drop a lock as is done in the RPC code io_service::post can be used rather than a call that actually does IO.
> I'm fairly confident in this approach since I used the same to implement various hardware async bus interfaces in VHDL to good effect i.e. high performance and easy to understand.  An asio callback is roughly analogous to a signal in a sensitivity list as the methods are to process blocks.
> Example state machine that would send some stuff, then wait to get something back like what the current BlockReader::AsyncRequestBlock does using the approach described above.
> {code}
> class ExampleHandshake {
>   // class would own any small buffers so they can be directly accessed
>  public:
>   void SendHandshake();
>  private:
>   void OnHandshakeSend();
>   void OnHandShakeDone();
>   asio::io_service service_;
>   asio::ip::tcp::socket socket_;
> }
> void ExampleHandshake::SendHandshake() {
>   // trampoline to jump into read state once write completes
>   auto trampoline[this](asio::error_code ec, size_t sz) {
>     //error checking here
>    this->OnHandshakeSend();
>   };
>   asio::write(service_, socket_, asio buffer of data here, trampoline);
> }
> void ExampleHandshake::OnHandshakeSend() {
>   // when read completes bounce into handler
>   auto trampoline = [this](asio::error_code ec, size_t sz) {
>     this->OnHandshakeDone();
>   };
>   asio::read(service_, socket_, asio buffer for received data, trampoline);
> }
> void ExampleHandshake::OnHandshakeDone() {
>   //just finished sending request, and receiving response, go do something
> }
> {code}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org