Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 63984 invoked from network); 10 Mar 2007 16:21:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Mar 2007 16:21:00 -0000 Received: (qmail 93472 invoked by uid 500); 10 Mar 2007 16:21:02 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 93442 invoked by uid 500); 10 Mar 2007 16:21:01 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 93431 invoked by uid 99); 10 Mar 2007 16:21:01 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Mar 2007 08:21:01 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of mrs.nospam@gmail.com designates 64.233.184.239 as permitted sender) Received: from [64.233.184.239] (HELO wr-out-0506.google.com) (64.233.184.239) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Mar 2007 08:20:49 -0800 Received: by wr-out-0506.google.com with SMTP id 68so1782310wra for ; Sat, 10 Mar 2007 08:20:28 -0800 (PST) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=ewHyG21DFk7L1ymwS/MS6As3PbQsmweVyGkYam3GB0z8d+zZ0xu6pQ/oZ3qFX16aF6SDiHBZLEXoqt2cS8j4kPkrCpq1u29NGf8tt9W7N337HVvDdQIO97ebnZSfK0vIX+u/hvC/pekVllm9LlonNAPHGpI8u1Axh9fbgyKuy2A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=J8UOX5u0t4N4MMYdgHe3EYdpvXQfUd1lhxjY4NHt7ckmJLHqKLjMwRC9C17HZlmyGgfBOLdnVoYjmiqrz8o2M4RDv9B0ZNGPX05M1SfLIlHuVGiRSoJFJY0BsTv6wNuMmg8ULVvxoCXZ/scZOZ28h0v9JfWkt1h5pAqlydPGNcQ= Received: by 10.90.81.14 with SMTP id e14mr1071199agb.1173543627799; Sat, 10 Mar 2007 08:20:27 -0800 (PST) Received: by 10.90.31.3 with HTTP; Sat, 10 Mar 2007 08:20:27 -0800 (PST) Message-ID: Date: Sat, 10 Mar 2007 11:20:27 -0500 From: "no spam" To: java-user@lucene.apache.org Subject: Re: updating index In-Reply-To: <359a92830702250705g3bef1f46rf8b20a957f772193@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_6705_18089274.1173543627743" References: <359a92830702211602k5214be53r81a81d776feec362@mail.gmail.com> <359a92830702250705g3bef1f46rf8b20a957f772193@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_6705_18089274.1173543627743 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline BTW Erick this works brilliantly with UN_TOKENIZED. SUPER fast :) On 2/25/07, Erick Erickson wrote: > > Yes, I'm pretty sure you have to index the field (UN_TOKENIZED) to be able > to fetch it with TermDocs/TermEnum! The loop I posted works like this.... > > for each term in the index for the field > if this is one I want to update > use a TermDocs to get to that document and operate on it. > > > But this is actually pretty silly. Your loop uses a better approach, > except > you're not using TermDocs correctly. Try > > TermDocs tDocs = new IndexReader.TermDocs() > for (Business biz : updates) > { > Term t = new Term("id", biz.getId()); > tDocs.seek(t); > while (tDocs.next()) > { > Document doc = reader.document(tDocs.doc()); > } > } > > But TermDocs/TermEnum is looking at terms in the index. If you haven't > indexed the term, you won't find it, so your Field.Index.NO is really > hurting you here. > > Best > Erick > > On 2/24/07, no spam wrote: > > > > I didn't fully understand your last post and why I wanted to do > > IndexReader.terms() then IndexReader.termDocs(). Won't something like > > this > > work? > > > > for (Business biz : updates) > > { > > Term t = new Term("id", biz.getId()+""); > > TermDocs tDocs = reader.termDocs(t); > > > > while (tDocs.next()) > > { > > Document doc = reader.document(tDocs.doc()); > > } > > } > > > > But tDocs never contains any docs. Is this because I've indexed my pk > > like > > this: > > > > doc.add(new Field("id", biz.getId(), Field.Store.YES, Field.Index.NO)); > > > > instead of > > > > doc.add(new Field("id", biz.getId(), Field.Store.YES, > > Field.Index.UNTOKENIZED)); > > > > Mark > > > > On 2/21/07, Erick Erickson wrote: > > > > > > I think you can get MUCH better efficiency by using TermEnum/TermDocs. > > But > > > I > > > think you need to index (UN_TOKENIZED) your primary key (although now > > I'm > > > not sure. But I'd be surprised if TermEnum worked with un-indexed > data. > > > Still, it'd be worth trying but I've always assumed that TermEnums > only > > > worked on indexed fields....)..... > > > > > > Anyway, your loop looks more like this... > > > > > > TermEnum terms = IndexReader.terms(new Term("primarykey", "")); > > > TermDocs tDocs = IndexRreader.termDocs(); > > > > > > while (terms.next()) { > > > if (docsToUpdate.contains(terms.text()) { > > > tDocs.seek(terms.term()); > > > writer.updateDocument(tDocs.doc()); > > > } > > > } > > > > > > NOTE: I've been fast and loose with edge conditions, like insuring > that > > > while (terms.next()) doesn't skip the first term, so caveat emptor.... > > > This > > > loop also assumes that there is one and only one document in your > index > > > with > > > the primary key. Otherwise, you have to do some more work with the > > > TermDocs > > > class to process each document that has your primary key... > > > > > > This is similar to creating Lucene filters, which is very fast.... > > > > > > Hope this helps > > > Erick > > > > > > > > > > > > > > > ------=_Part_6705_18089274.1173543627743--