commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Kestle (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IO-271) FileUtils.copyDirectory should be able to handle arbitrary number of files
Date Fri, 06 May 2011 01:29:03 GMT

    [ https://issues.apache.org/jira/browse/IO-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029690#comment-13029690
] 

Stephen Kestle commented on IO-271:
-----------------------------------

Late mod. date updating would be needed in edge cases around merging directories and detecting
if a file had successfully been copied. This is due to "holes" that could form between batches.

After talking with others today, we came up with the idea of using a Incremental file filter
that does the copy operation, and then returns false, so that the list of Files does not grow.

My estimation of memory usage is actually fully incorrect - listFiles() is far worse:

# It calls {{list()}} (everything does, it's a native method)
# It allocates a new Array for the files
# It creates the files and (on linux) resolves a new string for the full path of the file.
 So the deeper this directory is that has many files, the longer the path will be (I was only
doing one short directory name when I said double memory usage)
* If you're using the {{listFiles(FileFilter)}} method, an {{ArrayList}} is populated, and
then copied to an array at the end, using more memory.

*Notes:*
* Trying to find out how much memory is used *while* {{File}} is performing it's internal
copies and resolves is not trivial
* my memory use calculations (107 bytes vs 60 bytes for 10 char files in a 4 char directory)
were after I'd done {{System.gc()}}.  
* If I skipped the {{gc}} the Files took 167 bytes at the point of measuring after a 5 second
sleep
* Our ant tests (where this all started) seems to indicate that (for 500,000 files, under
the same conditions as my test above)
** {{File.list()}} (which ant's copy initially uses) requires around 30Mb
** {{File.listFiles()}} (which commons-io uses) requires around 150Mb
** These requirements were found by limiting the JVM Xmx settings until the respective {{File.list*()}}
passed without a OOME.

I will post more conclusive results soon once I've done some more tests using Xmx with only
the directory listing methods.

> FileUtils.copyDirectory should be able to handle arbitrary number of files
> --------------------------------------------------------------------------
>
>                 Key: IO-271
>                 URL: https://issues.apache.org/jira/browse/IO-271
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 2.0.1
>            Reporter: Stephen Kestle
>            Priority: Minor
>
> File.listFiles() uses up to a bit over 2 times as much memory as File.list().  The latter
should be used in doCopyDirectory where there is no filter specified.
> This memory usage is a problem when copying directories with hundreds of thousands of
files.
> I was also thinking of the option of implementing a file filter (that could be composed
with the inputted filter) that would batch the file copy operation; copy the first 10000 (that
match), then the next 10000 etc etc.
> Because of the lack of ordering consistency (between runs) of File.listFiles(), there
would need to be a final file filter that would accept files that have not successfully been
copied.
> I'm primarily concerned about copying into an empty directory (I validate this beforehand),
but for general operation where it's a merge, the modification date re-writing should only
be done in the final run of copies so that while batching occurs (and indeed the final "missed"
filtering) files do not get copied if they have been modified after the start time. (I presume
that I'm reading FileUtils correctly in that it overrides files...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message