hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12702) Add an HDFS metrics sink
Date Wed, 27 Jan 2016 01:48:40 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118451#comment-15118451

Karthik Kambatla commented on HADOOP-12702:

Thanks for filing and working on this, [~templedf]. Comments on the latest patch:
# FileSystemSink:
## Given the sink we are adding here as some quirks to its behavior - new directory every
hour etc., the class name FileSystemSink seems too simple. Can we capture more of the behavior
in the name? 
## Rename currentPath to currentDirPath, currentFile to currentFilePath, currentOut to currentOutStream
for clarity? 
## While reading {{BASEPATH_KEY}} from conf, there is no default value? 
## {{checkAppend}}: If appending throws an IOE that is not because of not being supported,
should we allow appending? I would think not. 
## {{rollLogDirIfNeeded}}: For readability, should we split it into two ifs - the first is
when the directories don't match. Also, the comment in the method is wrongly indented and
slightly confusing. 
if (!path.equals(currentPath)) {
  if (currentOut != null) {
     currentOut = null;
  currentPath = path;

if (currentOut == null) {
  // rest of the code
## Typo in the javadoc for createLogFile - nonExistant
## {{putMetrics}}: When throwing MetricsException, no need for a new line between setting
the message and actually throwing the exception. Also, should just have a method that takes
a message (String) and throws an exception if ignore error is not turned on. The only downside
would be the intern objects for the strings here. 
## Should {{flush}} be also invoking {{currentFSOut.hflush}}?

The tests look good. Should we play around with the allowed configs also? I am fine with not
doing that or following up in another JIRA.

> Add an HDFS metrics sink
> ------------------------
>                 Key: HADOOP-12702
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12702
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 2.7.1
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>         Attachments: HADOOP-12702.001.patch, HADOOP-12702.002.patch, HADOOP-12702.003.patch
> We need a metrics2 sink that can write metrics to HDFS. The sink should accept as configuration
a "directory prefix" and do the following in {{putMetrics()}}
> * Get yyyyMMddHH from current timestamp.
> * If HDFS dir "dir prefix" + yyyyMMddHH doesn't exist, create it. Close any currently
open file and create a new file called <hostname>.log in the new directory.
> * Write metrics to the current log file.
> * If a write fails, it should be fatal to the process running the sink.

This message was sent by Atlassian JIRA

View raw message