lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-2269) don't download/extract 20,000 files when doing the build
Date Sun, 21 Feb 2010 11:23:27 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-2269:
--------------------------------

    Attachment: LUCENE-2269.patch

great idea Mike, I removed all unzipping code and changed the file to the smaller bz2, which
is handled automagically by benchmark.

i also added a note about this test for the future:

{noformat}
NOTE: if the default scoring or StandardAnalyzer is changed, then
this test will no work correctly, as it does not dynamically
generate its test trec topics/qrels!
{noformat}

this is nothing new, but in my opinion an improvement in the future would be to dynamically
generate these files, it would also test the QualityQueriesFinder functionality, but we would
need to add the 'fake documents', etc for the test to work, too.

will commit shortly

> don't download/extract 20,000 files when doing the build
> --------------------------------------------------------
>
>                 Key: LUCENE-2269
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2269
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: Build
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>            Priority: Trivial
>             Fix For: 3.1
>
>         Attachments: LUCENE-2269.patch, LUCENE-2269.patch, reuters.578.lines.zip
>
>
> When you build lucene, it downloads and extracts some data for contrib/benchmark, especially
the 20,000+ files for the reuters corpus.
> this is only needed for one test, and these 20,000 files drive IDEs and such crazy.
> instead of doing this by default, we should only download/extract data if you specifically
ask (like wikipedia, collation do, etc)
> for the qualityrun test, instead use a linedoc formatted 587-line text file, similar
to reuters.first20.lines.txt already used by benchmark.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message