Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 52920 invoked from network); 16 Mar 2010 09:33:20 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Mar 2010 09:33:20 -0000 Received: (qmail 52069 invoked by uid 500); 16 Mar 2010 09:33:19 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 51862 invoked by uid 500); 16 Mar 2010 09:33:18 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 51854 invoked by uid 99); 16 Mar 2010 09:33:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Mar 2010 09:33:18 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.160.176] (HELO mail-gy0-f176.google.com) (209.85.160.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Mar 2010 09:33:11 +0000 Received: by gyd8 with SMTP id 8so1626161gyd.35 for ; Tue, 16 Mar 2010 02:32:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.192.4 with SMTP id p4mr876624anf.146.1268731970478; Tue, 16 Mar 2010 02:32:50 -0700 (PDT) In-Reply-To: References: Date: Tue, 16 Mar 2010 04:32:50 -0500 Message-ID: <9ac0c6aa1003160232j4553d5abqffdfbbb63fc68de2@mail.gmail.com> Subject: Re: "Deleting" documents without deleting them From: Michael McCandless To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org An incidental merge will delete them. I think you'll have to maintain your own filter... but it shouldn't be that large? Ie it's as large as deleted docs BitVector would be anyway... except that the docs never go away. Mike On Mon, Mar 15, 2010 at 11:20 PM, Daniel Noll wrote: > Hi all. > > I'm trying to implement a form of document deletion where the previous > versions are kept around forever ( a primitive form of versioning) but > excluded from the search results. > > I notice that after calling IndexWriter.deleteDocuments, even if you > close and reopen the index, the documents are still accessible using > document(int) but are returned from queries, which is exactly the > behaviour I want. =A0However, if I call optimize() they will obviously > be obliterated. > > My question is: as long as I never call optimize() -- will the deleted > documents hang around forever, or will a merge due to adding the new > documents eventually cause them to be removed? > > If they will be removed then I need some other way to avoid them being > returned. =A0I was thinking of actually *not* deleting them, but > maintaining a giant filter - I could store this filter on disk but > it's going to be pretty large even if I use a BitSet. :-( =A0 Is there > any other way to go about it? > > Daniel > > > > > -- > Daniel Noll =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Forens= ic and eDiscovery Software > Senior Developer =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0The world's most advanced > Nuix =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0email data analysis > http://nuix.com/ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0and eDiscovery software > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org