lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Magnus Johansson" <mag...@technohuman.com>
Subject Re: looking for a large test corpus for a lucene presentation
Date Wed, 07 Apr 2004 11:41:46 GMT
I have used some posts from usenet. There are many
of them and they contain metadata

/magnus


> Hi all,
>
> I'm doing a presentation to my local JUG on Lucene, and I'm looking for
> a "good" set of documents to use as a demonstration.
>
> Ideally it would be:
> 1) large (10,000 plus?).
> 2) contain some metadata besides "body" (like author, date, primarykey,
> etc).
> 3) freely available.
>
> I was going to use the data from the previous Google programming
> contest, but it doesn't seem to be available.
>
> If I can't find anything satisfactory, I'll probably:
> - generate a fake whitepages phonebook
> - grab documents from project Gutenberg
>
> My preference is for some "real" data, but I'm happy to generate fake
> data if no-one has any better ideas.
>
> :D
>
> =Matt
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message