hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Clampffer (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-11266) libhdfs++: Redesign block reader with with simplicity and resource management in mind
Date Wed, 21 Dec 2016 20:59:58 GMT
James Clampffer created HDFS-11266:

             Summary: libhdfs++: Redesign block reader with with simplicity and resource management
in mind
                 Key: HDFS-11266
                 URL: https://issues.apache.org/jira/browse/HDFS-11266
             Project: Hadoop HDFS
          Issue Type: Sub-task
            Reporter: James Clampffer
            Assignee: James Clampffer

The goal here is to significantly simplify the block reader and make it much harder to introduce
issues.  There are plenty of examples of these issues in the subtasks of  HDFS-8707, the one
that finally motivated a reimplementation is HDFS-10931.

-The read side protocol of the data transfer pipeline is fundamentally really simple (even
if done asynchronously).  The code should be equally simple.

-Get the code in a state that should be easy enough to reason about with a solid understanding
of HDFS and basic understanding of C++ and vice versa: improve comments and avoid using esoteric
C++ constructs.  This is a must-have in order to lower the bar to contribute.

-Get rid of dependencies on the existing continuation stuff.  Myself and others have spent
far too much time debugging both the continuation code and bugs introduced because the continuation
code was hard to reason about.  Notable issues:
  -It's cool from a theoretical perspective, but after 18 months of working on this it's still
unclear what problem the continuation idiom helped solve.
  -They spend more time allocating memory than the rest of the code does doing real work -
seriously, profile it.  This can't be fixed because the Pipeline takes ownership of all Continuation
objects and then deletes them.
  -The way the block reader really uses them is a hybrid of a state machine, continuations,
and directly using asio callbacks to bounce between the two.

Proposed approach:
Still have a BlockReader class that owns a PacketReader class, the packet reader is analogous
to the ReadPacketContinuation that the BlockReader builds now.  The difference is that none
of this will be stitched together at runtime using continuations, and once we have a block
reader with a member packet reader that gets allocated up front.  The PacketReader can be
recycled in order to avoid allocations.  The block reader is only responsible for requesting
block info, after that it keeps invoking the PacketReader until enough data has been read.

Async chaining:
Move to a state machine based approach.  This allows the readers to be pinned in memory, where
each state is represented as a method.  The asynchronous IO becomes the state transitions.
 A callback is supplied to the asio async call that jumps to the next state upon completion
of the IO operation.  Epsilon transitions will be fairly rare, but if we need them to temporarily
drop a lock as is done in the RPC code io_service::post can be used rather than a call that
actually does IO.

I'm fairly confident in this approach since I used the same to implement various async bus
interfaces in VHDL to good effect i.e. high performance and easy to understand.  An asio callback
is roughly analogous to a signal in a sensitivity list as the methods are to process blocks.

Example state machine that would send some stuff, then wait to get something back like what
the current BlockReader::AsyncRequestBlock does using the approach described above.

class ExampleHandshake {
  // class would own any small buffers so they can be directly accessed
  void SendHandshake();
  void OnHandshakeSend();
  void OnHandShakeDone();

  asio::io_service service_;
  asio::ip::tcp::socket socket_;

void ExampleHandshake::SendHandshake() {
  // trampoline to jump into read state once write completes
  auto trampoline[this](asio::error_code ec, size_t sz) {
    //error checking here
  asio::write(service_, socket_, asio buffer of data here, trampoline);

void ExampleHandshake::OnHandshakeSend() {
  // when read completes bounce into handler
  auto trampoline = [this](asio::error_code ec, size_t sz) {
  asio::read(service_, socket_, asio buffer for received data, trampoline);

void ExampleHandshake::OnHandshakeDone() {
  //just finished sending request, and receiving response, go do something

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

View raw message