lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Parkes (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff
Date Thu, 28 Jun 2007 14:06:26 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508833
] 

Steven Parkes commented on LUCENE-848:
--------------------------------------

Trying to reproduce now.

Something that came up while restarting the fetch/decompress/etc. was the number of files
this procedure creates. It's a lot: one for each article. I used the existing benchmark code
for doing this stuff but perhaps it's not a good idea on this scale? For one thing, it kinda
kills ant since ant wants to do a walk of subtrees for some of its tasks. Either we need to
exclude the work and temp directories from ant's walks and/or we should come up with something
better than one file per article.

I think Mike mentioned not doing the one file per article. I'll try to look at that ...

> Add supported for Wikipedia English as a corpus in the benchmarker stuff
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-848
>                 URL: https://issues.apache.org/jira/browse/LUCENE-848
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/benchmark
>            Reporter: Steven Parkes
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt,
LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, WikipediaHarvester.java, xerces.jar, xerces.jar,
xml-apis.jar
>
>
> Add support for using Wikipedia for benchmarking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message