lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Quach <>
Subject Using Lucene to index Wikipedia
Date Thu, 20 Oct 2011 16:30:17 GMT
How do I use the Lucene Benchmark to index a wikipedia dump? I want to 
be able to execute phrase queries on the latest english wikipedia page 
dump. I'm trying to look for example use cases but I haven't found any.

I downloaded the latest english dump, named: 
Then I ran the command in the terminal:
java org.apache.lucene.benchmark.utils.ExtractWikipedia -i 

which I believe extracted the pages into a directory labeled "enwiki"

Now is there something else in benchmarks that I need to run in order to 
index the wiki? The README.enwiki does not really give me a clear set of 
instructions, in fact I'm not even sure if I was supposed to run the 
ExtractWikipedia class or not.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message