poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Beeker <kiwiwi...@apache.org>
Subject Re: Test corpus vs. releases
Date Sun, 13 Nov 2016 19:12:08 GMT
The idea was to include only part of the test-data which is necessary for the test
excluding the integration test AND have a special corpus for integration-tests,
which can be downloaded on demand.

The motivation was to keep the releases smaller.

For the second part, it would be nice, if we have different collections,
e.g. poi-basic (the additional files, which are currently used for integration test),
tika (tika office corpus???), gov-docs, common-crawl, common-crawl-excel,
common-crawl-10gb (only 10gb)


On 13.11.2016 18:05, Dominik Stadler wrote:
> Hm, we are including the test-data directory in the sources as far as I
> see, so you should be able to run test-integration when you download just
> the source-package, or do I miss something here?
> Dominik
> On Sun, Nov 13, 2016 at 4:57 PM, Javen O'Neal <onealj@apache.org> wrote:
>> +1 for this idea.
>> Possible solutions:
>> 1) Publish the commands for a sparse svn checkout on the website. It looks
>> like Subversion doesn't have a simple "svn checkout
>> https://svn.apache.org/repos/asf/poi/trunk poi --exclude-dir test-data",
>> but we could get the same behavior with a checkout immediates, checkout
>> infinity awk listdir exclude test-data.
>> This could he packaged into a bat/shell script, ant target, or Gradle
>> target.
>> 2) retree test-data to be a sibling of trunk. We would need have some way
>> of pinning test-data so that old releases could be run against these
>> documents without breaking.
>> 3) Migrate away from asking users to check out the source using a
>> Subversion client, using Gradle to perform this checkout instead (solution
>> 1).
>> On Nov 13, 2016 5:41 AM, "Andreas Beeker" <kiwiwings@apache.org> wrote:
>>> Hi,
>>> our test corpus is constantly growing and I think this is good, as this
>>> covers the edge-cases in the integration tests.
>>> But I wonder if we need to include those files in the releases, e.g. we
>>> could make those files downloadable in case
>>> a users executes test-integration. Or maybe we find a way to have a
>> common
>>> corpus with tika ... but it should be
>>> easy to download/test those with/-in the ant/gradle scripts.
>>> What do you think?
>>> Andi
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
>>> For additional commands, e-mail: dev-help@poi.apache.org

To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org

View raw message