flume-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Percy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLUME-2725) HDFS Sink does not use configured timezone for rounding
Date Fri, 08 Jul 2016 21:54:11 GMT

    [ https://issues.apache.org/jira/browse/FLUME-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368551#comment-15368551
] 

Mike Percy commented on FLUME-2725:
-----------------------------------

Added a minor comment to the code review. I took a quick look at the patch and it looks pretty
good.

Overall, I think a config param to maintain backcompat could be useful in some circumstances,
but the fact that the rounding did not use the timezone in the first place seems like a bug
to me. Overall, I don't have a strong opinion. Maybe others have a strong opinion about it.

> HDFS Sink does not use configured timezone for rounding
> -------------------------------------------------------
>
>                 Key: FLUME-2725
>                 URL: https://issues.apache.org/jira/browse/FLUME-2725
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>            Reporter: Eric Czech
>            Assignee: Denes Arvay
>            Priority: Minor
>         Attachments: FLUME-2725.patch
>
>
> When a BucketPath used by an HDFS sink is configured to run with some roundUnit and roundValue
> 1 (e.g. 6 hours), the "roundDown" function used by BucketPath does not actually round
the date correctly.
> That function calls TimestampRoundDownUtil which creates a Calendar instance using the
*local* timezone to truncate a unix timestamp rather than the TimeZone that the sink was configured
to convert dates to paths with (and that timezone is already available in the BucketPath class
but it just isn't passed to TimestampRoundDownUtil).
> The net effect of this is that if a flume jvm is running on a system with an EST clock
while trying to write, say, 6 hour directories in UTC time, the directories are written with
the hours 04, 10, 16, 22 rather than 00, 06, 12, 18 like you would expect.
> I found a workaround for this by passing "-Duser.timezone=<hdfs_sink_timezone>"
as a system property, but I wanted to create a ticket for this since it seems like it would
be very minimal effort to carry that configured timezone down into the rounding utility as
well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message