hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Templeton (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-9782) RollingFileSystemSink should have configurable roll interval
Date Sat, 27 Feb 2016 03:48:18 GMT

     [ https://issues.apache.org/jira/browse/HDFS-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Daniel Templeton updated HDFS-9782:
    Attachment: HDFS-9782.005.patch

Turns out that HADOOP-8608 doesn't actually help me.  It adds the {{getTimeDuration()}} method
to {{Configuration}}, but metrics are initialized with a {{SubsetConfiguration}} (from Apache
Commons).  Looks like I still have to do the parsing by hand.  I have switched over to using
{{TimeUnit}} for the conversions, though.

After poking at trying to fake the clock, I came to a useful realization.  The {{BPServiceActor}}
tests are just trying to test the timing.  I'm already doing that in the {{TestRollingFileSystemSink}}
tests.  What I'm trying to test in the test with the sleeps is whether the flush thread successfully
flushes the logs, which can only be tested by actually scheduling it to run.  With that in
mind, I found a way to test that functionality with no sleeps in the common case.  The sleeps
are still there, just in case, but I've never seen it sleep even once.  I also bumped the
max sleep time in the test way up so that the chance of flakiness is approximately 0.

I still need to do more manual testing, but first let's see if this passes muster.

> RollingFileSystemSink should have configurable roll interval
> ------------------------------------------------------------
>                 Key: HDFS-9782
>                 URL: https://issues.apache.org/jira/browse/HDFS-9782
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>         Attachments: HDFS-9782.001.patch, HDFS-9782.002.patch, HDFS-9782.003.patch, HDFS-9782.004.patch,
> Right now it defaults to rolling at the top of every hour.  Instead that interval should
be configurable.  The interval should also allow for some play so that all hosts don't try
to flush their files simultaneously.
> I'm filing this in HDFS because I suspect it will involve touching the HDFS tests.  If
it turns out not to, I'll move it into common instead.

This message was sent by Atlassian JIRA

View raw message