Return-Path: Mailing-List: contact dev-help@ant.apache.org; run by ezmlm Delivered-To: mailing list dev@ant.apache.org Received: (qmail 25638 invoked from network); 4 Jun 2003 16:07:49 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 4 Jun 2003 16:07:49 -0000 Received: (qmail 13411 invoked by uid 50); 4 Jun 2003 16:10:05 -0000 Date: 4 Jun 2003 16:10:05 -0000 Message-ID: <20030604161005.13410.qmail@nagoya.betaversion.org> From: bugzilla@apache.org To: dev@ant.apache.org Cc: Subject: DO NOT REPLY [Bug 20103] - FileSet horrible performance when dir has huge number of subdirs X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT . ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=20103 FileSet horrible performance when dir has huge number of subdirs ------- Additional Comments From robert@windermere.com 2003-06-04 16:10 ------- I agree with the thoughts presented on revising the way that the various DirectoryScanner implementations do their business. Scan only the directories required to satisfy the wildcard patterns. Include files directly that have no wildcard patterns ( unless they have been excluded ). I created a quick and dirty override of the task that provides a remoteScan switch, allowing one to turn off remote scanning completely. Instead of using FTPDirectoryScanner, in this case, it uses DirectoryNoScanner. It is not very smart, really creating the totally opposite situation that we currently have. But, since I know the domain of my ( no patterns ), it is a decent performance test. With the remoteScan attribute set to the default of "yes", I have the following behavior: A list of 10 files takes approx. 5 minutes. A list of 35 files takes approx 10 minutes. A list of 100 files takes approx 30 minutes. If the gets much larger than this, the server times out ( during the scanning ) prior to downloading any files. With the remoteScan attribute set to "no", I have the following behavior: A list of 1000 files takes approx. 40 minuets. A list of 2500 takes approx 100 minutes. Downloading of files begins almost immediately, once the task connects to the server. These performance stats are really quite tied to my connection speed, the ftp server response, and approx. file size ( approx. 25KB each ). But, it does give a good indication of potential performance increases.