From java-user-return-37083-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Mon Nov 10 20:09:40 2008 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 1676 invoked from network); 10 Nov 2008 20:09:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Nov 2008 20:09:40 -0000 Received: (qmail 5362 invoked by uid 500); 10 Nov 2008 20:09:41 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 4975 invoked by uid 500); 10 Nov 2008 20:09:40 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 4964 invoked by uid 99); 10 Nov 2008 20:09:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Nov 2008 12:09:40 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of chadmichaeldavis@gmail.com designates 74.125.78.26 as permitted sender) Received: from [74.125.78.26] (HELO ey-out-2122.google.com) (74.125.78.26) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Nov 2008 20:08:22 +0000 Received: by ey-out-2122.google.com with SMTP id 6so1124711eyi.53 for ; Mon, 10 Nov 2008 12:09:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=HMcIlIorb0zQSL8nCCXXXzI2oVnKibLtQn/p91SMfAY=; b=Wc3C4Oxc/ebiUWPQMMf2dA5biKX5NKt2JrwDNuF00ld0riohYCMNxNJoH0zI8fPFeC s/unMe8BV2oJOhOyc9lDM6Ioq+obZUiLIcqrEL8fudoU3CL4bnOwMa6CK4ABaR+idjRM jaeJisNvS+byL5ixrIvcbmadFFlgwI7BJqhcI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=a1GE6BqmOSOE0k7KZkqpI90YBIuraWlL1Mq8/P4LhLoFteUboIVF2RlkbM3gz75lPh +gllMhjB3VUIMdpwC8xQ6cKDaLOhC0m9m5SwnzmNwYH2x1uC8InfBGWzKFFcJ5AhHJpP fX/hfbKyyJ1e4yu9Zy3IqrZYMXSGg9lIP/+VY= Received: by 10.187.204.16 with SMTP id g16mr2154915faq.14.1226347744085; Mon, 10 Nov 2008 12:09:04 -0800 (PST) Received: by 10.187.190.19 with HTTP; Mon, 10 Nov 2008 12:09:04 -0800 (PST) Message-ID: <4fe4c4f50811101209w6f8fd107ga743d06cc02e9398@mail.gmail.com> Date: Mon, 10 Nov 2008 13:09:04 -0700 From: ChadDavis To: java-user@lucene.apache.org Subject: Re: incremental update of index In-Reply-To: <359a92830811101152n60a66802s58a57e5509ee75af@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_78095_27067422.1226347744084" References: <4fe4c4f50811101122o6caf954haa2ece2f48f4277c@mail.gmail.com> <359a92830811101152n60a66802s58a57e5509ee75af@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_78095_27067422.1226347744084 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline That's what I thought. So, that leads me to . . . is it necessarily all that much faster to index in an incremental update fashion, rather than just clobbering the old index? On Mon, Nov 10, 2008 at 12:52 PM, Erick Erickson wrote: > You have to have indexed something that uniquely identifies the > document in order to know what the old one is. Really, this is > the same question as updating, isn't it? If you could update > a document in place, you'd have to know what document > that was. If you know that information, you know which > document to delete. > > Note that lucene has no built-in document recognition. If I > add the same document to the index twice, Lucene will > happily consider them two *separate* documents. You have > to code your own notion of document meta-id (as distinct > from the Lucene doc id). It could be the URL, the file path > on disk, a document ID from your organization... the > possibilities are endless. Which is why Lucene can't do that > for you. > > Best > Erick > > On Mon, Nov 10, 2008 at 2:22 PM, ChadDavis >wrote: > > > In the FAQ's it says that you have to do a manual incremental update: > > > > How do I update a document or a set of documents that are already > indexed? > > > > > > There is no direct update procedure in Lucene. To update an index > > > incrementally you must first *delete* the documents that were updated, > > and > > > *then re-add* them to the index. > > > > > > > How do I determine the existing document that matches the document I am > > updating? > > > ------=_Part_78095_27067422.1226347744084--