lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Quail <m...@ctx.com.au>
Subject looking for a large test corpus for a lucene presentation
Date Wed, 07 Apr 2004 09:07:20 GMT
Hi all,

I'm doing a presentation to my local JUG on Lucene, and I'm looking for 
a "good" set of documents to use as a demonstration.

Ideally it would be:
1) large (10,000 plus?).
2) contain some metadata besides "body" (like author, date, primarykey, 
etc).
3) freely available.

I was going to use the data from the previous Google programming 
contest, but it doesn't seem to be available.

If I can't find anything satisfactory, I'll probably:
- generate a fake whitepages phonebook
- grab documents from project Gutenberg

My preference is for some "real" data, but I'm happy to generate fake 
data if no-one has any better ideas.

:D

=Matt

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message