From java-user-return-53718-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Fri Sep 14 12:27:03 2012 Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 42673D0AB for ; Fri, 14 Sep 2012 12:27:03 +0000 (UTC) Received: (qmail 86321 invoked by uid 500); 14 Sep 2012 12:27:01 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 85502 invoked by uid 500); 14 Sep 2012 12:26:54 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 85338 invoked by uid 99); 14 Sep 2012 12:26:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Sep 2012 12:26:51 +0000 X-ASF-Spam-Status: No, hits=3.0 required=5.0 tests=FORGED_YAHOO_RCVD,FSL_RCVD_USER,SPF_NEUTRAL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.139.236.26] (HELO sam.nabble.com) (216.139.236.26) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Sep 2012 12:26:46 +0000 Received: from ben.nabble.com ([192.168.236.152]) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1TCUyX-0003bh-OI for java-user@lucene.apache.org; Fri, 14 Sep 2012 05:26:25 -0700 Date: Fri, 14 Sep 2012 05:26:25 -0700 (PDT) From: "Zeynep P." To: java-user@lucene.apache.org Message-ID: <1347625585742-4007730.post@n3.nabble.com> Subject: test LA Times with pruning package MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi to all, I used pruning package with LA Times collection. The initial LA Times index is created by lucene benchmark/conf/*.alg. Luke shows 131896 documents with 635614 terms for initial index. I pruned with CarmelTopKPruning policy with epsilon = 0.1 by varying k. However, my results do not correspond to the original paper's results (Static Index Pruning for Information Retrieval Systems by Carmel et al.). Lucene score function can be the reason but the difference is big so I wonder if the package is tested with LA Times and the similar results are obtained??? What can be the reason of such difference? I count the number of postings by for each term counter += te.docFreq(); Do you know any paper who uses this package for experiments? k, Prune(%) Original Paper, Prune (%) Pruning Package, # postings in pruned index , # posting no pruned 1 49,2 91 3663309 37860694 5 40,2 90 4139019 10 36,4 89 4485072 15 34,2 88 4743474 50 x 69 11990022 Thanks in advance, Best Regards ZP -- View this message in context: http://lucene.472066.n3.nabble.com/test-LA-Times-with-pruning-package-tp4007730.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org