ant-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 20103] - FileSet horrible performance when dir has huge number of subdirs
Date Tue, 03 Jun 2003 15:41:48 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=20103>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=20103

FileSet horrible performance when dir has huge number of subdirs





------- Additional Comments From robert@windermere.com  2003-06-03 15:41 -------
I seem to be running up against the same performance issue with <fileset/>.  

I have created a <fileset/> that describes a set of files to be downloaded from
an ftp server using the <ftp/> task.  I wish to download every file specified in
a list.  Currently, this list is provided in a file, using the includesfile
attribute of <fileset/>.  Each line of that file defines a file to 
download.  Each file in the list is described as an absolute path from 
the ftp server root.  Each file in the list could potentially be located in a
unique subdirectory on the ftp server.  No wildcards are utilized, no pattern
matching required.

The buildtime increases substantially as the size of the includesfile increases.  
I believe that the majority of the buildtime is not spent in actually
transferring the images, but instead in scanning the directories for pattern
matches.  In my case, this is seriously crippling, because those directory scans
are remote against an ftp server.

A list of 10 files takes approx. 5 minutes.  A list of 35 files takes 
approx 10 minutes.  A list of 100 files takes approx 30 minutes.  My
includesfile could contain approx. 2000 files.  The ftp server times out prior
to transferring any files, when there are a large number represented.

In all cases, success or otherwise, the task seems to hang for the 
majority of the processing time, prior to transferring any images.  It seems
that remote directory scanning is taking place at this point.  The 
logged message is shown below:

      [ftp] getting files

If the process makes it past this sticky point, then it rather quickly 
downloads the requested files with the following example logged messages:

      [ftp] transferring webphoto/bigphoto/96/23010396_04.jpg to 
/home/robert/dvl/collector/build/images/Northwest/webphoto/bigphoto/96/23010396_
04.jpg
      [ftp] transferring webphoto/bigphoto/96/23010396_05.jpg to 
/home/robert/dvl/collector/build/images/Northwest/webphoto/bigphoto/96/23010396_
05.jpg

I would concur with the previous suggestions as they pertain to <fileset/>. 
<fileset/> should only scan directories if required to satisfy some pattern
match.  In the case of fully qualified filenames, <fileset/> should take a less
"heavy-handed" approach, and simply include those files with no directory scanning.

Mime
View raw message