Return-Path: Mailing-List: contact dev-help@ant.apache.org; run by ezmlm Delivered-To: mailing list dev@ant.apache.org Received: (qmail 45948 invoked from network); 4 Jun 2003 15:00:35 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 4 Jun 2003 15:00:35 -0000 Received: (qmail 10700 invoked by uid 50); 4 Jun 2003 15:02:51 -0000 Date: 4 Jun 2003 15:02:51 -0000 Message-ID: <20030604150251.10699.qmail@nagoya.betaversion.org> From: bugzilla@apache.org To: dev@ant.apache.org Cc: Subject: DO NOT REPLY [Bug 20103] - FileSet horrible performance when dir has huge number of subdirs X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT . ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=20103 FileSet horrible performance when dir has huge number of subdirs ------- Additional Comments From ddevienne@lgc.com 2003-06-04 15:02 ------- Here's my take on FileSet optimization that meshes with the problem we are personnally seing here. Problem: We have a large source base (5000+ files) which is scanned repeatedly to extract just a subset of these files, which are usally one or more subtrees of the fileset: in somepath/src, we select: com/acme/foo/** com/acme/bar/** The fileset must be declared with dir="somepath/src" to keep the proper relative filename (corresponding to the package name for a javac task), but obviously only somepath/src/com/acme/foo and somepath/src/com/acme/bar need to be scanned, and not the full somepath/src. These repeated full scans of somepath/src to extract a subset of the sources are adding up to minutes... Solution: I can think of two. I simple one, and a more difficult one. Simple Solution: add a subelement (which can be specified more than once) which explicitly tell the FileSet which directory it should scan rather than the one specified in the fileset's dir attribute. Harder Solution: Infer the searchroots from the patterns themselves... Not impossible, but difficult. The advantage of the simple solution is that is works with the use of selectors, since its explicit, so the build file writer knows that the selector s/he uses do not affect the searchroots. This of course doesn't solve the other performance problem of FileSet, when used with a long list of explicit filenames without patterns. Thanks for reading, --DD