ant-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jaikiran <...@git.apache.org>
Subject [GitHub] ant pull request #76: bz-43144 - Improve the performance of the tar task whe...
Date Wed, 31 Oct 2018 14:48:42 GMT
GitHub user jaikiran opened a pull request:

    https://github.com/apache/ant/pull/76

    bz-43144 - Improve the performance of the tar task when it uses a zipfileset

    https://bz.apache.org/bugzilla/show_bug.cgi?id=43144 is an issue where users have reported
that the tar task is extremely slow when used with a `zipfileset`. 
    
    The comment in that bug, from @bodewig, rightly notes that the reason for this slowness
has to do with the code where we iterate over the zip entries as a `Resource` and then open
an `InputStream` for each entry during the `tar` task. The implementation of the `ZipResource#getInputStream`
opens a new `ZipFile` every time and as Stefan notes in that bug, the reason it's done this
way is because we don't know what would be the right time to close a `ZipFIle` if the implementation
of the `ZipResource` would want to hold on to a open `ZipFile` instance and reuse it.
    
    However, there are occasions, like the one here, where the callers of the `ArchiveFileSet`
are aware and know when and how long they expect the underlying archive to be open. The commit
in this PR, introduces a new method (`openArchive()`) on the `ArchiveFileSet` to allow such
callers to have more control over the opening and closing of the underlying resource(s) like
the `ZipFile`. The whole goal of this new method is to allow iterating over the entries in
the archive (just like the existing `iterator()` method) and yet the same time exposing a
way for users to explicitly `close` the returned `ArchiveEntries`. When to use the `iterator()`
method and when to use the `openArchive` method will be left to the users of ArchiveSet.
    
    The commit in this PR intentionally just exposes only one method `openArchive` as a `public`
method and keeps the rest of the new methods either private or package private. Right now,
only `ZipFileSet` overrides the new package private method to provide a scanner which holds
on to open resource(s), if it's asked to do so.
    
    The changes in this commit maintain backward compatibility of the existing methods and
does _not_ introduce any change in behaviour of their usages or semantics.
    
    The javadocs of the new methods explain more about what each one does and how the usage
typically looks like.
    
    I have run a few of the existing Tar related tests locally and haven't found any regressions.
I have also run a manual test to see how the new performance with this change looks like.
I used a random zip file which is around 5MB in size and has 927 entries (as reported by the
unzip command):
    
    ```
    unzip -l source.zip
    <contents of the file> ....
    ---------                     -------
     22222385                     927 files
    ```
    I then used the following build file:
    
    ```
    <?xml version="1.0" encoding="UTF-8"?>
    <project name="test" default="main">
        <target name="main">
          <mkdir dir="build"/>	
          <tar destFile="build/test.tar" longfile="gnu">
          	<zipfileset src="source.zip"/>
           </tar>
        </target>
    </project>
    ```
    Ran the following command:
    ```
    time ant
    ```
    With Ant 1.10.5 (the latest released), it takes a while to complete and reports:
    
    ```
    Total time: 23 seconds
    
    real	0m23.485s
    user	0m16.767s
    sys	0m9.368s
    ```
    So around 23 seconds for the tar task.
    
    With this patch and the freshly built Ant distribution, this same build file (I did the
necessary cleanup of the  build generated tar file, before running it again) reports:
    
    ```
    Total time: 0 seconds
    
    real	0m1.068s
    user	0m1.994s
    sys	0m0.258s
    ```
    
    Around 1 second for the exact same task. So as expected there's a big improvement. I have
done basic comparison of the generated tar files, with Ant 1.10.5 and this patched version
to check it contains all the expected content and it looked fine. I will see if I can add
some kind of automated testing around this if possible. Until then I would like inputs on
whether this looks fine and any suggestions/feedback on this change. Right now, this is targetted
for master branch. I'll take a look later if it's easy (I guess, should be) to have this in
1.9.x too.
    
    P.S: The linked bug also has one comment which states that the `copy` task when it uses
a `zipfileset` is impacted by this performance issue too. I haven't checked or verified that.
At a later date, I'll take a look if it needs this same/similar fix.
    
    
    
     

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jaikiran/ant bz-43144-master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/ant/pull/76.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #76
    
----
commit 81b8c80c550ba560770a1f82de60c4d0b11ace91
Author: Jaikiran Pai <jaikiran@...>
Date:   2018-10-31T13:59:29Z

    bz-43144 - (performance fix) Explicitly control when an archive is opened and closed when
a zipfileset is used as a resource collection in the tar task

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Mime
View raw message