Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 55760 invoked from network); 16 Mar 2010 09:45:57 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Mar 2010 09:45:57 -0000 Received: (qmail 65841 invoked by uid 500); 16 Mar 2010 09:45:55 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 65728 invoked by uid 500); 16 Mar 2010 09:45:55 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 65707 invoked by uid 99); 16 Mar 2010 09:45:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Mar 2010 09:45:54 +0000 X-ASF-Spam-Status: No, hits=-0.4 required=10.0 tests=AWL,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rene.a.hackl@gmx.de designates 213.165.64.20 as permitted sender) Received: from [213.165.64.20] (HELO mail.gmx.net) (213.165.64.20) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 16 Mar 2010 09:45:46 +0000 Received: (qmail invoked by alias); 16 Mar 2010 09:45:24 -0000 Received: from p5DCD38A5.dip.t-dialin.net (EHLO [192.168.178.45]) [93.205.56.165] by mail.gmx.net (mp072) with SMTP; 16 Mar 2010 10:45:24 +0100 X-Authenticated: #24166002 X-Provags-ID: V01U2FsdGVkX18oqIxGoysowBrfqLJyelGVnzrz1Y63rV9zzoyhyA mJ0sJjqRhvfEAH Message-ID: <4B9F5334.1050905@gmx.de> Date: Tue, 16 Mar 2010 10:45:24 +0100 From: Rene Hackl-Sommer User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: "Deleting" documents without deleting them References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-FuHaFi: 0.56000000000000005 Hi Daniel, Unless you have only a few documents and a small index, I don't think never calling optimize is going to be a means you should rely upon. What about if you reindexed the documents you are deleting, adding a field with the value "true"? This would imply that either 1) all fields are stored, so you may retrieve them from the original doc and add them to the new one plus the exclusion field 2) or if a lot of fields are only indexed you'd need access to the original source. (With limitations it is also possible to reconstruct a field from indexed data only, but not generally recommendable) During search, just add "NOT excludeFromSearch:true" to the query. If you need to keep track of which versions belong together, you may need to think about how you uniquely identify documents, how this changes between versions, and if the update dates might be of any help. Cheers Rene Am 16.03.2010 05:20, schrieb Daniel Noll: > Hi all. > > I'm trying to implement a form of document deletion where the previous > versions are kept around forever ( a primitive form of versioning) but > excluded from the search results. > > I notice that after calling IndexWriter.deleteDocuments, even if you > close and reopen the index, the documents are still accessible using > document(int) but are returned from queries, which is exactly the > behaviour I want. However, if I call optimize() they will obviously > be obliterated. > > My question is: as long as I never call optimize() -- will the deleted > documents hang around forever, or will a merge due to adding the new > documents eventually cause them to be removed? > > If they will be removed then I need some other way to avoid them being > returned. I was thinking of actually *not* deleting them, but > maintaining a giant filter - I could store this filter on disk but > it's going to be pretty large even if I use a BitSet. :-( Is there > any other way to go about it? > > Daniel > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org