hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12759) RollingFileSystemSink should eagerly rotate directories
Date Wed, 03 Feb 2016 03:09:40 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15129690#comment-15129690

Andrew Wang commented on HADOOP-12759:

Hi Daniel, I'm coming into this fresh, so please excuse my comments as I get up to speed on
this. Overall looks good, only nitty stuff, then some questions:

* Not a fan of a test config key since it's exposed to end users, can we use a static variable
or a VisibleForTesting setter instead? I didn't see any related tests in HDFS-9637. I'm hoping
whatever test emerges does not involve Thread.sleep, since I hate sleeping in unit tests.
* The probing logic, instead of trying creates until we find a free file, should we list the
directory once first? Or once after the first failed create, then probe?
* Need {{<p/>}} tags to get line breaks in class javadoc.

Some high-level or commentary or nits:

* In the penultimate paragraph of the class javadoc, do you know why reads fail? I'd believe
{{close}} failing if the pipeline strength falls (HDFS-4504), but reads failing after a successful
close is surprising. This is generally only an issue with small clusters.
* An aside comment, since HDFS always writes one block to the local DN, it can lead to skew
if there's only one or few writers. Just an FYI depending on your usecase.

> RollingFileSystemSink should eagerly rotate directories
> -------------------------------------------------------
>                 Key: HADOOP-12759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12759
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 2.8.0
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>            Priority: Critical
>         Attachments: YARN-4664.001.patch
> The RollingFileSystemSink only rolls over to a new directory if a new metrics record
comes in.  The issue is that HDFS does not update the file size until it's closed (HDFS-5478),
and if no new metrics record comes in, then the file size will never be updated.
> This JIRA is to add a background thread to the sink that will eagerly close the file
at the top of the hour.

This message was sent by Atlassian JIRA

View raw message