lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zeynep P." <>
Subject test LA Times with pruning package
Date Fri, 14 Sep 2012 12:26:25 GMT
Hi to all,

I used pruning package with LA Times collection. The initial LA Times index
is created by lucene benchmark/conf/*.alg. Luke shows 131896 documents with
635614 terms for initial index. I pruned with CarmelTopKPruning policy with
epsilon = 0.1 by varying k.  However, my results do not correspond to the
original paper's results (Static Index Pruning for Information Retrieval
Systems by Carmel et al.). Lucene score function can be the reason but the
difference is big so I wonder if the package is tested with LA Times and the
similar results are obtained???

What can be the reason of such difference? I count the number of postings by
for each term counter += te.docFreq();

Do you know any paper who uses this package for experiments?

k, Prune(%) Original Paper, 	Prune (%) Pruning Package,	 # postings in
pruned index ,	# posting no pruned

1	49,2	91	3663309	37860694
5	40,2	90	4139019	
10	36,4	89	4485072	
15	34,2	88	4743474	
50	x	69	11990022	

Thanks in advance,
Best Regards

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message