Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 1237 invoked from network); 8 Jul 2007 06:40:33 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Jul 2007 06:40:33 -0000 Received: (qmail 90567 invoked by uid 500); 8 Jul 2007 06:40:28 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 90528 invoked by uid 500); 8 Jul 2007 06:40:28 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 90517 invoked by uid 99); 8 Jul 2007 06:40:28 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Jul 2007 23:40:28 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [132.68.225.10] (HELO mailgw10.technion.ac.il) (132.68.225.10) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Jul 2007 23:40:24 -0700 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ao8CAK4ikEaERHMG/2dsb2JhbAA X-IronPort-AV: E=Sophos;i="4.16,513,1175461200"; d="scan'208";a="33206972" Received: from fermat.math.technion.ac.il ([132.68.115.6]) by mailgw10.technion.ac.il with ESMTP; 08 Jul 2007 09:40:01 +0300 Received: from fermat.math.technion.ac.il (localhost [127.0.0.1]) by fermat.math.technion.ac.il (8.12.10/8.12.10) with ESMTP id l686e0UD014649 for ; Sun, 8 Jul 2007 09:40:00 +0300 (IDT) Received: (from nyh@localhost) by fermat.math.technion.ac.il (8.12.10/8.12.10/Submit) id l686dwKJ014648 for java-user@lucene.apache.org; Sun, 8 Jul 2007 09:39:58 +0300 (IDT) X-Authentication-Warning: fermat.math.technion.ac.il: nyh set sender to nyh@math.technion.ac.il using -f Date: Sun, 8 Jul 2007 09:39:58 +0300 From: "Nadav Har'El" To: java-user@lucene.apache.org Subject: Re: problems with deleteDocuments Message-ID: <20070708063958.GA14351@fermat.math.technion.ac.il> References: <20070704004859.J82369@turing> <359a92830707040730y31ef1934u93784df70e9c40de@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <359a92830707040730y31ef1934u93784df70e9c40de@mail.gmail.com> User-Agent: Mutt/1.4.2.2i Hebrew-Date: 22 Tammuz 5767 X-Virus-Checked: Checked by ClamAV on apache.org On Wed, Jul 04, 2007, Erick Erickson wrote about "Re: problems with deleteDocuments": > Consider what would happen otherwise. Say you have documents > with the following values for a field (call it blah). > some data > some data I put in the index > lots of data > data > > Then I don't want deleting on the term blah:data to remove all > of them. Which seems to be what you're asking. Even if > you restricted things to "phrases", then deleting on the term > 'blah:some data' would remove two documents. > > So, while UN_TOKENIZED isn't a *requirement*, exact total term > matches *is* the requirement. By that, I meant that whatever > goes into the field must not be broken into pieces by the indexing > tokenizer for deletes to work as you expect. I disagree, and frankly, am very surprised that "exact total term matches" is actually a requirement (I never tried it, so you may be absolutely right, I just hope you aren't). Let me give you just one example where id fields containing multiple words, and the ability for a delete query to match several documents, are useful. Consider an application for indexing emails with attachments. The email text, and each document attachment, is indexed as a separate document. When an email is deleted, we also need to delete its attachments. How shall we do this? One simple implementation is to have an "id" field for each document; The email text document will have a unique id, and the attachment document will have two ids: its own unique id, and the containing email's id. When we need to remove an email and all its attachments, we just remove all documents that match the email's id - and this will include the main text and the attachments. By the way, the method is called "deleteDocuments" - doesn't that imply that it's perfectly acceptable to delete many documents with one term? -- Nadav Har'El | Sunday, Jul 8 2007, 22 Tammuz 5767 IBM Haifa Research Lab |----------------------------------------- |I am not a complete idiot - some parts http://nadav.harel.org.il |are missing. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org