lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Indexing Wikipedia dumps
Date Wed, 12 Dec 2007 05:35:05 GMT
Hi,

I need to index a Wikipedia dump.  I know there is code in contrib/benchmark for indexing
*English* Wikipedia for benchmarking purposes.  However, I'd like to index a non-English dump,
and I actually don't need it for benchmarking, I just want to end up with a Lucene index.

Any suggestions where I should start?  That is, can anything in contrib/benchmark already
do this, or is there anything there that I should use as a starting point?  As opposed to
writing my own Wikipedia XML dump parser+indexer.

Thanks,
Otis



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message