lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zeynep P." <zp...@yahoo.com>
Subject Wikipedia revision history dump + lucene benchmark
Date Tue, 10 Apr 2012 17:33:51 GMT
wikipedia.alg in benchmark is only able to extract and index current pages
dumps. It does not take revisions into account. Do you know any way to do
this? Or should I change EnwikiContentSource to handle the versions?

Although, Wikipedia dumps are widely used especially for research purposes,
as far as I know, there is no topics/qrels for them (except the one 
http://www.mpi-inf.mpg.de/~kberberi/ecir2010/ here  for revision history
dump 2001 - 2005 which is annotated based on temporal expressions). The
question is that do you know any other?

By the way, I think in wikipedia.alg
query.maker=org.apache.lucene.benchmark.byTask.feeds.*ReutersQueryMaker*
should be remplaced by *EnwikiQueryMaker*.

Thanks in advance,
Best regards
-- 
ZP

--
View this message in context: http://lucene.472066.n3.nabble.com/Wikipedia-revision-history-dump-lucene-benchmark-tp3900346p3900346.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message