hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Kelly (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1799) Refactor log rolling and filename management out of FSEditLog
Date Sat, 30 Apr 2011 09:26:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027308#comment-13027308
] 

Ivan Kelly commented on HDFS-1799:
----------------------------------

@Todd

Why retain the JournalManager at all in that design? It adds extra object management overhead
which I think can be avoided. 
How about...

Pre transaction naming
{code}
class EditLogOutputStream {
  EditLogOutputStream(etc);

  void setBufferCapacity(int size);
  void close();  // closes and "finalizes"
  void abort();  // closes but marks as possibly truncated (eg after an IOE)

  EditLogOutputStream divert(); // create a new edit log pointing to edits.new
  void revert(); // revert stream from edits.new to edits

  /* Other methods already existing */
}

class FSEditLog {
  List<EditLogOutputStream> editStreams;
  List<EditLogOutputStream> badStreams;

  void open() {
    for (StorageDirectory sd : storage.getEditDirectories()) {
      editStreams.add(new EditLogFileOutputStream(sd));
    }
  }

  synchronized void rollEditLog() {
    List<EditLogOutputStream> newStreams = new ArrayList<EditLogOutputStream>();
    for (EditLogOutputStream s : editStreams) {
      newStreams.add(s.divert());
    }        

    // bad streams can also be added to new Streams either by calling divert on them 
    // and if they manage to create a stream add it.
    editStreams = newStreams;
  }

  synchronized void purgeEditLogs() {
    for (EditLogOutputStream s : editStreams) {
      s.revert();
    }
  }
}
{code}

Post transaction naming, this becomes (assuming segments)
{code}
class EditLogOutputStream {
  EditLogOutputStream(etc);

  void setBufferCapacity(int size);
  void close();  // closes and "finalizes"
  void abort();  // closes but marks as possibly truncated (eg after an IOE)

  EditLogOutputStream nextSegment(long startTxnId); // create a new edit log pointing to edits.new
  
  /* Other methods already existing */
}

class FSEditLog {
  List<EditLogOutputStream> editStreams;
  List<EditLogOutputStream> badStreams;

  void open() {
    for (StorageDirectory sd : storage.getEditDirectories()) {
      editStreams.add(new EditLogFileOutputStream(sd));
    }
  }

  synchronized void rollEditLog() {
    List<EditLogOutputStream> newStreams = new ArrayList<EditLogOutputStream>();
    for (EditLogOutputStream s : editStreams) {
      newStreams.add(s.nextSegment(txid));
    }        

    // bad streams can also be added to new Streams either by calling nextSegment on them

    // and if they manage to create a stream add it.
    editStreams = newStreams;
  }

  synchronized void purgeEditLogs() {
    // this goes away
  }
}
{code}

Once divert() or nextSegment() is called, the stream it is being called on should be invalidated.
The advantages of this design are:
 - Very little code change to FSEditLog
 - Reduces number of new abstractions introduced (i.e. no JournalManager)
 - Simple to change later, if log segments are decided against.

Also, it uses the Prototype design pattern which I can't ever recall seeing used in the wild.


@Jitendra
The LogSegment stuff is on github. It follows on from the design in this patch.

> Refactor log rolling and filename management out of FSEditLog
> -------------------------------------------------------------
>
>                 Key: HDFS-1799
>                 URL: https://issues.apache.org/jira/browse/HDFS-1799
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: Edit log branch (HDFS-1073)
>
>         Attachments: 0001-Added-state-management-to-FSEditLog.patch, 0002-Standardised-error-pattern.patch,
0003-Add-JournalFactory-and-move-divert-revert-out-of-FSE.patch, HDFS-1799-all.diff, hdfs-1799.txt,
hdfs-1799.txt, hdfs-1799.txt, hdfs-1799.txt
>
>
> This is somewhat similar to HDFS-1580, but less ambitious. While that JIRA focuses on
pluggability, this task is simply the minimum needed for HDFS-1073:
> - Refactor the filename-specific code for rolling, diverting, and reverting log streams
out of FSEditLog into a new class
> - Clean up the related code in FSEditLog a bit
> Notably, this JIRA is going to temporarily break the BackupNode. I plan to circle back
on the BackupNode later on this branch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message