Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 71871 invoked from network); 21 Apr 2009 13:13:33 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Apr 2009 13:13:33 -0000 Received: (qmail 73473 invoked by uid 500); 21 Apr 2009 13:13:27 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 73405 invoked by uid 500); 21 Apr 2009 13:13:27 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 73239 invoked by uid 99); 21 Apr 2009 13:13:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Apr 2009 13:13:26 +0000 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [151.190.254.13] (HELO edge.itt.com) (151.190.254.13) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Apr 2009 13:13:18 +0000 Received: from fwexhub3.itt.net (10.32.76.113) by edge.itt.com (10.32.16.13) with Microsoft SMTP Server (TLS) id 8.1.340.0; Tue, 21 Apr 2009 09:13:10 -0400 Received: from IIWCBO-EHUB1.iiw.de.ittind.com (10.244.2.24) by fwexhub3.itt.net (10.32.76.113) with Microsoft SMTP Server (TLS) id 8.1.340.0; Tue, 21 Apr 2009 09:12:55 -0400 Received: from iiwcbo-emb1.iiw.de.ittind.com ([10.244.2.29]) by IIWCBO-Ehub1.iiw.de.ittind.com ([::1]) with mapi; Tue, 21 Apr 2009 09:12:54 -0400 From: "Newman, Billy" To: "java-user@lucene.apache.org" Date: Tue, 21 Apr 2009 09:12:52 -0400 Subject: RE: IndexWriter update method Thread-Topic: IndexWriter update method Thread-Index: AcnCYORmKehOL0/CS9GpRbqU6KfyLwAIagBg Message-ID: References: <499888440904171705s467e3bap95598f3f870c20b3@mail.gmail.com> <359a92830904171907j2b8fdb91kbd762c905240b3df@mail.gmail.com> <359a92830904201628o2fa6684cr99561a97f0442002@mail.gmail.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Yeah I was hoping to change the code to use the update method after I upgra= ded from 1.4.3 but doesn't look feasible. I will just continue to find th= e doc and delete it, then re-insert it. Thanks for all the help guys! -----Original Message----- From: Doron Cohen [mailto:cdoronc@gmail.com]=20 Sent: Tuesday, April 21, 2009 3:09 AM To: java-user@lucene.apache.org Subject: Re: IndexWriter update method *IndexWriter.deleteDocuments *(Query query) may be handy too (but note that it will delete *all* docs that match the query). Doron On Tue, Apr 21, 2009 at 2:28 AM, Erick Erickson wr= ote: > I don't think you *can* create a Term that spans two fields. Perhaps > you'd be better off just doing a search, getting the doc ID back then > adding a new version of the document. > > You *could* think about reindexing your corpus and indexing an > additional field that was the concatenation of the two fields you > want to update by if that's a better solution. > > Best > Erick > > On Mon, Apr 20, 2009 at 11:40 AM, Newman, Billy >wrote: > > > What if you're unique id is a composite of two field when you create th= e > > document? > > > > I.E. > > doc.add(new Field("partno", "123345", > > Field.Store.whatever, Field.Index.UN_TOKENIZED); > > doc.add(new Field("storeLoc", "Springfield", > > Field.Store.whatever, Field.Index.UN_TOKENIZED); > > > > How do you create a Term for this? Is this possible? Is this the > correct > > way to create a document that has two fields? If so I am a little lost > on > > how to create a term to correctly find this. > > > > Thanks, > > Billy > > > > -----Original Message----- > > From: Erick Erickson [mailto:erickerickson@gmail.com] > > Sent: Friday, April 17, 2009 8:08 PM > > To: java-user@lucene.apache.org > > Subject: Re: IndexWriter update method > > > > What you're missing is that the example has no unique ID, it wasn't > created > > with update in mind. > > > > There's no hidden magic for Lucene knowing *what* document you want > > to have updated, you have to provide it yourself, and it should be > unique. > > > > Imagine a parts catalog, or an index of a directory tree. In the parts > > catalog, > > you could identify the document by its part number, so you'd probably > index > > it something like doc.add(new Field("partno", "123345", > > Field.Store.whatever, Field.Index.UN_TOKENIZED); > > Indexing a directory tree you could use the complete file path similarl= y. > > > > Now, each document will have one (and only one) partno, and it'll be > unique > > (you really > > don't want to tokenize this). > > > > To update, you'd form your term on the field "partno" and value "123345= ", > > thus uniquely > > identifying the document you want replaced, and use that term in your > > update > > statement. > > Think of the Term as a unique key for the document that *you've* > > deliberately put there. > > > > I'm pretty sure (but not positive) that if you update a document where > the > > term doesn't > > have any matches, you'll get a simple insert, but I won't guarantee it. > > > > HTH > > Erick > > > > > > On Fri, Apr 17, 2009 at 9:28 PM, Newman, Billy > > wrote: > > > > > Ok I am still confused. > > > > > > Looking at the examples to index a document I would do something like > the > > > following: > > > Document document =3D new Document(); > > > document.add(Field.UnStored("article", article)); > > > document.add(Field.Text("comments", comments)); > > > Analyzer analyzer =3D new StandardAnalyzer(); > > > IndexWriter writer =3D new IndexWriter(indexDirectory, analyze= r, > > > false); > > > writer.addDocument(document); > > > writer.optimize(); > > > writer.close(); > > > > > > Now lets say that the comments can change and when they do I want to > > update > > > that document to contain the newly updated comments. > > > > > > So I would have to go back and check my index to see if that book > already > > > exists. > > > Query q =3D new QueryParser("article", analyzer).parse(querystr); > > > int hitsPerPage =3D 10; > > > IndexSearcher searcher =3D new IndexSearcher(index); > > > TopDocCollector collector =3D new TopDocCollector(hitsPerPage); > > > searcher.search(q, collector); > > > ScoreDoc[] hits =3D collector.topDocs().scoreDocs; > > > if (hits!=3D null && hits.length > 0) { > > > // ? > > > // Then this already exists and I just want to update the comments > > section > > > } > > > > > > Does that make sense? Am I going about this wrong? > > > > > > Billy > > > > > > > > > ________________________________________ > > > From: Tim Williams [williamstw@gmail.com] > > > Sent: Friday, April 17, 2009 6:05 PM > > > To: java-user@lucene.apache.org > > > Subject: Re: IndexWriter update method > > > > > > On Fri, Apr 17, 2009 at 7:27 PM, Newman, Billy > > > wrote: > > > > I am looking for info on how to use the IndexWriter.update method. = A > > > short example of how to add a document and then later update would > > > > be very helpful. I get lost because I can add a document with just > the > > > document, but I need a document and a Term. I am not really sure > > > > what a Term is since I did not use a Term to create the document no= r > do > > I > > > see it in any of the examples of searching/adding. > > > > > > When you index the document, add an ID field that is unique. Then > > > when you go to update the document the "Term" will be the ID of the > > > document you wish to update. For example, you might add a URL as the > > > unique ID, then to update it might look something like: > > > > > > writer.update(new Term("id","http://apache.org/lucene/index.htm"), > doc) > > > > > > > > > --tim > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > This e-mail and any files transmitted with it may be proprietary and > are > > > intended solely for the use of the individual or entity to whom they > are > > > addressed. If you have received this e-mail in error please notify th= e > > > sender. > > > Please note that any views or opinions presented in this e-mail are > > solely > > > those of the author and do not necessarily represent those of ITT > > > Corporation. The recipient should check this e-mail and any attachmen= ts > > for > > > the presence of viruses. ITT accepts no liability for any damage caus= ed > > by > > > any virus transmitted by this e-mail. > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org