From issues-return-18573-apmail-commons-issues-archive=commons.apache.org@commons.apache.org Fri May 6 02:54:45 2011 Return-Path: X-Original-To: apmail-commons-issues-archive@minotaur.apache.org Delivered-To: apmail-commons-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E7A1A3EA2 for ; Fri, 6 May 2011 02:54:44 +0000 (UTC) Received: (qmail 92879 invoked by uid 500); 6 May 2011 02:54:44 -0000 Delivered-To: apmail-commons-issues-archive@commons.apache.org Received: (qmail 92543 invoked by uid 500); 6 May 2011 02:54:43 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 92535 invoked by uid 99); 6 May 2011 02:54:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 May 2011 02:54:42 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 May 2011 02:54:41 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 237F6C392E for ; Fri, 6 May 2011 02:54:03 +0000 (UTC) Date: Fri, 6 May 2011 02:54:03 +0000 (UTC) From: "Stephen Kestle (JIRA)" To: issues@commons.apache.org Message-ID: <413692125.26943.1304650443142.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1437856922.24418.1304595843118.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (IO-271) FileUtils.copyDirectory should be able to handle arbitrary number of files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/IO-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029703#comment-13029703 ] Stephen Kestle commented on IO-271: ----------------------------------- I created a main method which will create a Thread that will either {{list()}} or {{listFiles()}} for 500,000 files under the following conditions: * the full canonical directory * the relative "log" directory when running main from my application dir * the "." directory when running main from my application's log dir {{list()}} required a constant {{-Xmx41m}} for all invocations {{listFiles()}} required: * 91MB for "." * 94MB for "log" (which is 3 chars * 2 bytes * 2 copies * 500000 = 3MB difference) * A whopping 181MB for the full canonical Program Files path (which is the most likely path we'd be using) _Note that the jvm needs somewhere between 1000-1500k to launch_ So the memory usage is something like 4.5 times which I think is significant enough to fix. I'd suggest that when the file filter is {{null}} that {{list()}} is used, and when it a filter is given, use {{list(FileNameFilter)}} where the filter: # takes the string # creates a file object # delegates to the given {{FileFilter}} # throws away the File and accepts or rejects the String based on the {{FileFilter}} result Extra for experts (that's you guys :)); switch the above FileFilter behaviour based on the amount of free memory in the system when processing the files by retaining the {{File}} array, starting memory stats and a count etc. That is, if memory's getting low, and the number of Files in the (Object) array high, run through and replace the Files with their name, and continue by name. > FileUtils.copyDirectory should be able to handle arbitrary number of files > -------------------------------------------------------------------------- > > Key: IO-271 > URL: https://issues.apache.org/jira/browse/IO-271 > Project: Commons IO > Issue Type: Improvement > Components: Utilities > Affects Versions: 2.0.1 > Reporter: Stephen Kestle > Priority: Minor > > File.listFiles() uses up to a bit over 2 times as much memory as File.list(). The latter should be used in doCopyDirectory where there is no filter specified. > This memory usage is a problem when copying directories with hundreds of thousands of files. > I was also thinking of the option of implementing a file filter (that could be composed with the inputted filter) that would batch the file copy operation; copy the first 10000 (that match), then the next 10000 etc etc. > Because of the lack of ordering consistency (between runs) of File.listFiles(), there would need to be a final file filter that would accept files that have not successfully been copied. > I'm primarily concerned about copying into an empty directory (I validate this beforehand), but for general operation where it's a merge, the modification date re-writing should only be done in the final run of copies so that while batching occurs (and indeed the final "missed" filtering) files do not get copied if they have been modified after the start time. (I presume that I'm reading FileUtils correctly in that it overrides files...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira