lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karsten Konrad" <Karsten.Kon...@xtramind.com>
Subject AW: Free, medium size, downloadable corpus of newspaper articles ?
Date Wed, 30 Jul 2003 13:30:10 GMT

Hello,

industry standard is a thing called Reuters21578. 

http://www.daviddlewis.com/resources/testcollections/reuters21578/

That should do,

Karsten


-----Urspr√ľngliche Nachricht-----
Von: Jos van der Meer [mailto:jmee@aidministrator.nl]
Gesendet: Mittwoch, 30. Juli 2003 15:20
An: lucene-dev@jakarta.apache.org
Betreff: Free, medium size, downloadable corpus of newspaper articles ?



For my experiments with Lucene, I would like to have a publicly available
free, medium size, downloadable corpus of newspaper articles
(topics do not matter, nor does its publication date).

For I would like to share the results of the experiments, and people
should be able to reproduce and to extend it.

Don't send the corpora themselves (..), but please send me their URLs.

Thanks in advance,


jos.van.der.meer@aidministrator.nl
aidministrator nederland bv  -  http://www.aidministrator.nl/
prinses julianaplein 14-b, 3817 cs amersfoort, the netherlands
tel. +31-(0)33-4659987   fax. +31-(0)33-4659987


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message