commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Murry (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (IO-523) Do not reload the entire file when a tailed file's length and position are the same but the file is newer
Date Sat, 26 Nov 2016 16:25:59 GMT

     [ https://issues.apache.org/jira/browse/IO-523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tyler Murry updated IO-523:
---------------------------
    Description: 
In the Tailer class, when the file length is equal to the position and the file is newer,
the following branch is executed:

{code:title=org.apache.commons.io.input.Tailer.java}
// ----------- Lines 461 - 472 --------------
// ...
else if (newer) {
  /*
   * This can happen if the file is truncated or overwritten with the exact same length of
   * information. In cases like this, the file position needs to be reset
   */
  position = 0;
  reader.seek(position); // cannot be null here

  // Now we can read new lines
  position = readLines(reader);
  last = file.lastModified();
}
// ...
{code}

The comments in the branch specifically mention wanting to reset the position and reload the
entire file. However, I believe this can lead to undesirable effects in certain cases.

One example is when you are tailing one file into another file. If this branch is hit, the
entire input file is recopied into the output file. This is especially troublesome if you
have a rouge file who's timestamp changes regularly without any content changes.

My improvement would be to simply remove this branch if it works for the general case as well.
Or, at least for special cases, allow a parameter to be checked to prevent this behavior.


  was:
In the Tailer class, when the file length is equal to the position and the file is newer,
the following branch is executed:

{code:title=org.apache.commons.io.input.Tailer.java}
// ----------- Lines 461 - 472 --------------
// ...
else if (newer) {
  /*
   * This can happen if the file is truncated or overwritten with the exact same length of
   * information. In cases like this, the file position needs to be reset
   */
  **position = 0;**
  reader.seek(position); // cannot be null here

  // Now we can read new lines
  position = readLines(reader);
  last = file.lastModified();
}
// ...
{code}

The comments in the branch specifically mention wanting to reset the position and reload the
entire file. However, I believe this can lead to undesirable effects in certain cases.

One example is when you are tailing one file into another file. If this branch is hit, the
entire input file is recopied into the output file. This is especially troublesome if you
have a rouge file who's timestamp changes regularly without any content changes.

My improvement would be to simply remove this branch if it works for the general case as well.
Or, at least for special cases, allow a parameter to be checked to prevent this behavior.



> Do not reload the entire file when a tailed file's length and position are the same but
the file is newer
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: IO-523
>                 URL: https://issues.apache.org/jira/browse/IO-523
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Streams/Writers
>    Affects Versions: 2.5
>         Environment: Windows 10
>            Reporter: Tyler Murry
>            Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In the Tailer class, when the file length is equal to the position and the file is newer,
the following branch is executed:
> {code:title=org.apache.commons.io.input.Tailer.java}
> // ----------- Lines 461 - 472 --------------
> // ...
> else if (newer) {
>   /*
>    * This can happen if the file is truncated or overwritten with the exact same length
of
>    * information. In cases like this, the file position needs to be reset
>    */
>   position = 0;
>   reader.seek(position); // cannot be null here
>   // Now we can read new lines
>   position = readLines(reader);
>   last = file.lastModified();
> }
> // ...
> {code}
> The comments in the branch specifically mention wanting to reset the position and reload
the entire file. However, I believe this can lead to undesirable effects in certain cases.
> One example is when you are tailing one file into another file. If this branch is hit,
the entire input file is recopied into the output file. This is especially troublesome if
you have a rouge file who's timestamp changes regularly without any content changes.
> My improvement would be to simply remove this branch if it works for the general case
as well. Or, at least for special cases, allow a parameter to be checked to prevent this behavior.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message