flume-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Confuse (Jira)" <j...@apache.org>
Subject [jira] [Created] (FLUME-3350) Spooldir source may collect empty files and write them to HDFS
Date Thu, 09 Jan 2020 13:44:00 GMT
Confuse created FLUME-3350:

             Summary: Spooldir source may collect empty files and write them to HDFS
                 Key: FLUME-3350
                 URL: https://issues.apache.org/jira/browse/FLUME-3350
             Project: Flume
          Issue Type: Bug
          Components: Sinks+Sources
    Affects Versions: 1.9.0
            Reporter: Confuse
         Attachments: image-2020-01-09-21-33-55-306.png

When I collect data from spooldir source to HDFS,i found if an empty file is created in
spoolDir, an empty file with the same name will appear on hfds. It seems unreasonable. After
reading source coding,i fount this code the following conditions will never be true in SpoolDirectorySource

 public void run() {
      int backoffInterval = 250;
      boolean readingEvents = false;
      try {
        while (!Thread.interrupted()) {
          readingEvents = true;
          List<Event> events = reader.readEvents(batchSize);
          readingEvents = false;
           # this conditions will never be true
          if (events.isEmpty()) {

Please confirm whether this phenomenon is a problem. In my opinion, collecting empty file
is meaningless. Especially for HDFS, it is not allowed to store too many small files on HDFS.
Even if the user puts a lot of empty files unconsciously, flume should process it instead
of writing to HDFS.

This message was sent by Atlassian Jira

To unsubscribe, e-mail: issues-unsubscribe@flume.apache.org
For additional commands, e-mail: issues-help@flume.apache.org

View raw message