hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12759) RollingFileSystemSink should eagerly rotate directories
Date Thu, 04 Feb 2016 19:09:40 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132810#comment-15132810

Andrew Wang commented on HADOOP-12759:

bq. On the probing logic, the reason I do it that way is to get synchronization across daemons.
I let HDFS sort out who gets any given file name. If I list files first, the list of files
could change by the time I go to create the file.

I mentioned this off-hand in my previous comment, but how about we try once, if it fails list
to find the last element and try n+1, then keep probing linearly until it works. This is then
no overhead for the common case (no collisions) and we skip to the end if there is a conflict.
Intent is to avoid a full linear probe.

bq. Any heartburn about a half-second sleep?

Tolerable heartburn, but I was hoping for some solution with advancing a fake clock and then
waking up the sleeping thread. I'll still +1 though if you don't want to change this.

> RollingFileSystemSink should eagerly rotate directories
> -------------------------------------------------------
>                 Key: HADOOP-12759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12759
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 2.8.0
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>            Priority: Critical
>         Attachments: YARN-4664.001.patch
> The RollingFileSystemSink only rolls over to a new directory if a new metrics record
comes in.  The issue is that HDFS does not update the file size until it's closed (HDFS-5478),
and if no new metrics record comes in, then the file size will never be updated.
> This JIRA is to add a background thread to the sink that will eagerly close the file
at the top of the hour.

This message was sent by Atlassian JIRA

View raw message