lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Parkes (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff
Date Wed, 27 Jun 2007 21:26:26 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508661
] 

Steven Parkes commented on LUCENE-848:
--------------------------------------

Actually, I just noticed wikimedia provides the md5 hashes. I was able to validate my copy.

I don't actually remember if I got my copy from wikimedia or from p.a.o.

The copy in your ls -l looks bad, both from the sha1sum and from the size. Looks like your
file is truncated: the file length is 455M (if 477278208  is the size in bytes) and the real
file is 2686431976 (2.6G) bytes.

Can you check the file on p.a.o, both the size and the md5 hash? The latter should be
fc24229da9af033cbb55b9867a950431
(http://download.wikimedia.org/enwiki/20070527/enwiki-20070527-md5sums.txt)

I should be able to launch a test of the unzip/extract tonight. It takes a while.

> Add supported for Wikipedia English as a corpus in the benchmarker stuff
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-848
>                 URL: https://issues.apache.org/jira/browse/LUCENE-848
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/benchmark
>            Reporter: Steven Parkes
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt,
LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, WikipediaHarvester.java, xerces.jar, xerces.jar,
xml-apis.jar
>
>
> Add support for using Wikipedia for benchmarking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message