flume-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Percy <mpe...@apache.org>
Subject Re: Review Request 48161: FLUME-2918: TaildirSource is underperforming with huge parent directories
Date Tue, 14 Jun 2016 01:42:42 GMT


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java,
line 164
> > <https://reviews.apache.org/r/48161/diff/1/?file=1404557#file1404557line164>
> >
> >     This sorting function seems problematic to me. It can call stat() up to n^2
times (assuming quicksort). Shouldn't we get the last modified time of each file in the list
once and then sort it? Why are we sorting, anyway?
> >     
> >     By the way, I just checked the OpenJDK source code and indeed every File.lastModified()
maps to FileSystem.getLastModifiedTime(f) which maps to stat(2). See https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/solaris/native/java/io/UnixFileSystem_md.c#L205
> 
> Attila Simon wrote:
>     Sorting was part of the original implementation (my change didn't make it worse).
I wanted to be non-intrusive.

If you want to maintain the sorting behavior (and I still don't know how it is used or why
it is required) then please do the stat() call on each of the files in the list, and cache
the mtime of each file, then sort the files based on those cached mtimes. It just doesn't
make any sense to do IO to read the metadata O(n^2) times.


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java,
line 49
> > <https://reviews.apache.org/r/48161/diff/1/?file=1404560#file1404560line49>
> >
> >     style nit: leave spaces around your brackets. We use the java style guide in
Flume (admittedly not as consistently as I wish we did, though)
> 
> Attila Simon wrote:
>     Related to coding style nits: is there any importable flume related style configuration
for intellij somewhere I haven't found already? If yes could you please point me to its direction?

Nope, but a checkstyle file would be welcome that enforced Java style with 2-space indent.


- Mike


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48161/#review136086
-----------------------------------------------------------


On June 13, 2016, 2:14 p.m., Attila Simon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48161/
> -----------------------------------------------------------
> 
> (Updated June 13, 2016, 2:14 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2918
>     https://issues.apache.org/jira/browse/FLUME-2918
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> The way TailDir source checks which files should be tracked was improved. Existing implementation
caused unneccessary high CPU usage for huge (+50K files) directories. This fix allows users
to eliminate continous listing of parent directory (on each Source.process invocation) and
introduce a more performant method for listing&matching files.
> 
> used java.nio.file.DirectoryStream to filter files
> made pattern match calculation optionally cached
> added junit tests
> added javadoc
> added license
> 
> 
> Diffs
> -----
> 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java
5b6d465 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
PRE-CREATION 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java
8816327 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java
6165276 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
PRE-CREATION 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java
f9e614c 
> 
> Diff: https://reviews.apache.org/r/48161/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install -DskipTests -> built
> junit tests for flume-taildir-source module -> passed
> 
> 
> Thanks,
> 
> Attila Simon
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message