flume-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Satoshi Iijima (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLUME-3083) Taildir source can miss events if last updated time in same second as file mtime
Date Sun, 09 Apr 2017 12:24:42 GMT

    [ https://issues.apache.org/jira/browse/FLUME-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962127#comment-15962127

Satoshi Iijima commented on FLUME-3083:

If idleTimeout property has been set less than 1000 milliseconds, missing events that you
point out might be possible.
But this setting is very inefficient because TaildirSource.closeTailFiles() would be called
continually and needlessly. 
If idleTimeout has been properly set (default: 120000 milliseconds), current TaildirSource
would not miss reading events.

> Taildir source can miss events if last updated time in same second as file mtime
> --------------------------------------------------------------------------------
>                 Key: FLUME-3083
>                 URL: https://issues.apache.org/jira/browse/FLUME-3083
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: 1.7.0
>            Reporter: eskrm
>         Attachments: FLUME-3083-0.patch
> I suspect there is a scenario where the taildir source can miss reading events from a
log file due to how the source determines whether a file has been updated. In ReliableTaildirEventReader:
> {code}
> boolean updated = tf.getLastUpdated() < f.lastModified()
> ...
> tf.setNeedTail(updated);
> {code}
> Consider this sequence of events from TaildirSource.process(). Assume they all happen
within the same second and there is just a single log file.
> # Call ReliableTaildirEventReader.updateTailFiles()
> #* This call will set ReliableTaildirEventReader.updateTime to current time in milliseconds
> #* Assume the underlying file has not been updated within the last idleTimeout milliseconds
> # Due to idleness, the tail file's inode is added to TaildirSource.idleInodes in idleFileCheckerRunnable
> # tf.needTail is false. Skip reading file.
> # Underlying file is updated with events E1
> # TaildirSource.closeTailFiles()
> #* Call TaildirSource.tailFileProcess() before close to read any pending events
> #* Events E1 are read and processed
> #* Since events were read, call ReliableTaildirEventReader.commit() which updates the
tail file's position and sets its last updated time to ReliableTaildirEventReader.updateTime
from 1.a
> #* Close file
> # Events E2 are written to underlying file. File's modification time is in the same second
as the tail file's last updated time.
> # Since the time returned by File.lastModified() is the mtime in seconds converted to
milliseconds the file's last modified time is less than the tail file's last updated time
and taildir won't reopen the file to read E2.
> #* This behaviour of File.lastModified() may be platform/jvm specific. I confirmed the
behavior using OpenJDK 8 on Ubuntu precise.  
> Can someone confirm this?

This message was sent by Atlassian JIRA

View raw message