hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Templeton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9782) RollingFileSystemSink should have configurable roll interval
Date Tue, 09 Feb 2016 07:32:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138478#comment-15138478
] 

Daniel Templeton commented on HDFS-9782:
----------------------------------------

In HDFS-9780, [~andrew.wang] suggested the interval be milliseconds.  Given that most intervals
are going to be on the order of hours, an interval of milliseconds seems cruel.  How many
milliseconds in a day?  Plus any interval less than about 10 minutes is at risk of creating
a problematic number of small files.  For these reasons I'm going to ignore Andrew's suggestion
and go with minutes as the interval.

> RollingFileSystemSink should have configurable roll interval
> ------------------------------------------------------------
>
>                 Key: HDFS-9782
>                 URL: https://issues.apache.org/jira/browse/HDFS-9782
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>
> Right now it defaults to rolling at the top of every hour.  Instead that interval should
be configurable.  The interval should also allow for some play so that all hosts don't try
to flush their files simultaneously.
> I'm filing this in HDFS because I suspect it will involve touching the HDFS tests.  If
it turns out not to, I'll move it into common instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message