From java-user-return-52981-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Wed Jun 13 14:20:24 2012 Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2A1D4C55F for ; Wed, 13 Jun 2012 14:20:24 +0000 (UTC) Received: (qmail 89157 invoked by uid 500); 13 Jun 2012 14:20:22 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 88970 invoked by uid 500); 13 Jun 2012 14:20:21 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 88961 invoked by uid 99); 13 Jun 2012 14:20:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jun 2012 14:20:20 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a92.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jun 2012 14:20:14 +0000 Received: from homiemail-a92.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a92.g.dreamhost.com (Postfix) with ESMTP id 9B2DA3DC077 for ; Wed, 13 Jun 2012 07:19:51 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=getopt.org; h=message-id:date :from:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; q=dns; s=getopt.org; b= Uq3s4AKBto4hKvoCHXobcsX+ztmgcKUmzEse9nucAB56gDB7LLfEKQl4R1jsgQlz Xace+XKAGeCsaaGGJYGM6IdUdFo0gN0AUgyfB8TB7/iDlvIwOvaDtt2ElPqACLRe 3yQ5IT3Lmerr97+py7tIsKde9DamVPr2Gxf3vp4Z4Uo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=getopt.org; h=message-id :date:from:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; s=getopt.org; bh=mUkTLn rp1g9EEJSDpK6DQTUqOnk=; b=IHWRPI/5oLa2RUsExYsC5hH5HczlT6ALlpaVGt NUyPM6fAM0auPxsz+tD2iCpjzECFROzcaO9Jz2XPVQK6V3Elk+puIjXkSiliqXNx txOS9rK4UvF/8Rfoe5Cc6dqsFtoN8SzqozPQyITXfZDLzRhHQmfEI/MJ9SdTuIhp WjmVU= Received: from abacus.local (unknown [81.219.54.251]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: ab@getopt.org) by homiemail-a92.g.dreamhost.com (Postfix) with ESMTPSA id E92553DC05E for ; Wed, 13 Jun 2012 07:19:50 -0700 (PDT) Message-ID: <4FD8A182.7030500@getopt.org> Date: Wed, 13 Jun 2012 16:19:46 +0200 From: Andrzej Bialecki User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Index pruning References: <4FC57A56.9060005@fastmail.co.uk> In-Reply-To: <4FC57A56.9060005@fastmail.co.uk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 30/05/2012 03:39, Greg Bowyer wrote: > Hi all > > I am playing about with the index pruning contrib package, I want to see > if it will make a faster and slightly smaller index for me. However when > I try either Carmel or RIDF methods it just ends up deleting all my > postings for the two fields of interest. > > My command line for RIDF is as follows, any ideas what I could be doing > wrong ? > > java -cp > ./lucene-pruning-3.6.1-sz_release.jar:./lucene-core-3.6.1_sz_release.jar > org.apache.lucene.index.pruning.PruningTool \ > -del title:pPsv,descr:pPsv \ > -in ./index -out ./pruned-index2 \ > -impl ridf -t -0.1 Hey Greg, Sorry for a late answer ... the field spec string "pPsv" configures the PruningPolicy to completely delete the postings, so it's doing what you told it to do ;) The actual pruning would have happened at a later stage, but since the postings are removed first there is nothing to prune. If your intention was to apply pruning only to selected fields then there is no command-line option for this in the tool - however, it's easy to add it, because the *Policy implementations usually take a Map to specify thresholds, where keys are either field names of field:term pairs. -- Best regards, Andrzej Bialecki http://www.sigram.com, blog http://www.sigram.com/blog ___.,___,___,___,_._. __________________<><____________________ [___||.__|__/|__||\/|: Information Retrieval, System Integration ___|||__||..\|..||..|: Contact: info at sigram dot com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org