ant-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Benson <gudnabr...@gmail.com>
Subject Re: A performance issue in 'restrict'
Date Tue, 15 Jun 2010 16:38:17 GMT
On 6/15/10, Nicolas Lalevée <nicolas.lalevee@hibnet.org> wrote:
> On Monday 14 June 2010 20:01:08 Matt Benson wrote:
>> Hi, Nicolas:
>>
>> [SNIP]
>>
>> > I looked into the code, it seems
>> > org.apache.tools.ant.types.resources.Restrict is the culprit. It uses
>> > BaseResourceCollectionWrapper which loads the entire underlying resource
>> > collection.
>> > I searched for the use of BaseResourceCollectionWrapper in Ant and I
>> > think that this performance issue may also affect
>> > org.apache.tools.ant.types.resources.SizeLimitCollection and
>> > org.apache.tools.ant.types.resources.Tokens.
>> >
>> > I started to think about a fix but it seems non trivial. If I'm not
>> > mistaken we would have to implement an iterator which should deal with
>> > an
>> > optionnal cache and concurrency.
>>
>> While it would be possible to reimplement Restrict's internal wrapper
>> such that it would dynamically calculate the included resources while
>> iterating, I don't see any way to do the same when the *size* of the
>> collection is requested...
>
> Yep if size is called then the whole collection has to be "fetched".
> I just hope that in my use case where I just want the first element of the
> restrict, the whole resource collection won't be fetched. And as I keep
> looking into the code, it seems that the implementation of
> org.apache.tools.ant.types.resources.First does that. So size() shouldn't be
> an issue here.
>
>> > I am also worried about the isFilesystemOnly implementation in Restrict.
>> > Should it return true if actually selected resources are files, or
>> > return
>> > true if the entire set of candidates are files ? In the first case it
>> > would make the iterator useless.
>>
>> Typically the implementation of #isFilesystemOnly() in
>> BaseResourceCollectionWrapper and BaseResourceCollectionContainer is a
>> quick check whether the underlying collection/s is/are known to be
>> entirely filesystem-based.  If not, each resource is checked in turn
>> until one is found that is *not* filesystem-based.  The reason for the
>> distinction is to provide a means of recognizing whether it should be
>> possible to handle a given resourcecollection using a task that can
>> only work with files.
>
> So I am suggesting a little change on the implementation: removing the check
> isFilesystemOnly on the underlying resource collection; just keep the
> iteration on the actual resources of the underlying collection.
> I don't think it will change the semantic of the function, right?
> About performance, in my use case it should behave better as the <first>
> will
> only load one resource in the collection.
> I think there could be a performance impact where a resource collection is
> known to be file system only, and it is used with a task that cannot support
> that. The resources will be loaded but not actually used as the build will
> fail.
> So it seems a choice between:
>  * useless iteration on some resources which can have high latency
>  * useless iteration on some resources which should be on a filsytem and
> trigger a failed build.
>
> I prefer the second choice. I might not be impartial though ;)
>

I feel like if we get stuck on #isFilesystemOnly(), we're going to
lose sight of the important issue here.  This kind of diagnostic
method must, by definition, ultimately survey each resource, but
checking against the nested collection/s is a valuable optimization.
Consider the case where you may wrap a <first> in a <sort> in a
<restrict> of some other resourcecollection.  If each of these has to
fully iterate, that amounts to greater latency than attempting to
delegate that call all the way back to the innermost nested
collection, does it not?

The bigger issue is whether we can figure out a way to increase
throughput by delegating iterator calls.  The problem is that
BaseResourceCollectionWrapper and BaseResourceCollectionContainer both
implement the cache attribute by caching a collection of resources.
We could explore making the implementation of the cached collection
more efficient.  Once again, my understanding of your wider purpose
here is, given:

<first>
  <restrict>
    <someotherresourcecollection />
  </restrict>
</first>

You would like for the innermost collection to iterate only until
restrict accepts one of its resources.  Perhaps the original
implementation is naive in its simplicity.  It has been several years
now since the bulk of that code was added to Ant; if we can figure out
how to optimize it now I would be perfectly happy with that, but I
don't think #isFilesystemOnly() is germane to that efficiency
discussion.

-Matt

> Nicolas
>
>
>>
>> -Matt
>>
>> > Side note: I didn't checked that piece of script in as our Hudson
>> > instance doesn't have Ant 1.8 installed. Not yet. I'll ask for it.
>> >
>> > Nicolas
>> >
>> > [1] http://ant.apache.org/manual/Types/resources.html#resourcelist
>> >
>> > PS: I started recently to intensively use loadresource and all the
>> > resource related stuff in my build scripts, it is amazingly porwerful !
>> > It reminds me when I was 'pipelining' in cocoon ;)
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
>> > For additional commands, e-mail: dev-help@ant.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
>> For additional commands, e-mail: dev-help@ant.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
> For additional commands, e-mail: dev-help@ant.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Mime
View raw message