Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 88112 invoked from network); 28 Nov 2008 20:29:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Nov 2008 20:29:30 -0000 Received: (qmail 3319 invoked by uid 500); 28 Nov 2008 20:29:35 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 3284 invoked by uid 500); 28 Nov 2008 20:29:35 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 3273 invoked by uid 99); 28 Nov 2008 20:29:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Nov 2008 12:29:35 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [134.2.129.75] (HELO penthesilea.sfs.uni-tuebingen.de) (134.2.129.75) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Nov 2008 20:28:05 +0000 Received: from [134.2.129.126] (ithaka.sfs.uni-tuebingen.de [134.2.129.126]) by penthesilea.sfs.uni-tuebingen.de (Postfix) with ESMTP id 544F5C6D6 for ; Fri, 28 Nov 2008 21:28:50 +0100 (MET) Message-ID: <49305509.5000409@sfs.uni-tuebingen.de> Date: Fri, 28 Nov 2008 21:31:05 +0100 From: Niels Ott User-Agent: Thunderbird 2.0.0.18 (X11/20081125) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Deleting from Index by URL field: is it safe? Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi all, I want to safely delete documents from my index. There is an URL field that specifies where the document came from. I'm using something like this: indexwriter.deleteDocuments(new Term("URL", myURL)); (inspired by the Lucene in Action Book, page 35.) I'm uncertain whether this is safe or not: is there a chance that I delete documents I would want to keep? How does the matching exactly work. During indexing, I'm using a KeywordAnalyzer for the URL field in order to avoid tokenization. Best, Niels -- Niels Ott Computational Linguist (B.A.) http://www.drni.de/niels/ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org