Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (herse.apache.org: domain of mrs.nospam@gmail.com
 designates 64.233.184.239 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=beta;
        h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references;
        b=J8UOX5u0t4N4MMYdgHe3EYdpvXQfUd1lhxjY4NHt7ckmJLHqKLjMwRC9C17HZlmyGgfBOLdnVoYjmiqrz8o2M4RDv9B0ZNGPX05M1SfLIlHuVGiRSoJFJY0BsTv6wNuMmg8ULVvxoCXZ/scZOZ28h0v9JfWkt1h5pAqlydPGNcQ=
Message-ID: <bd818c7b0703100820k6d2389b7rd992ba75dddbaad7@mail.gmail.com>
Date: Sat, 10 Mar 2007 11:20:27 -0500
From: "no spam" <mrs.nospam@gmail.com>
To: java-user@lucene.apache.org
Subject: Re: updating index
In-Reply-To: <359a92830702250705g3bef1f46rf8b20a957f772193@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_6705_18089274.1173543627743"
References: <bd818c7b0702211442i78a202d7y8d527b12dbf441f6@mail.gmail.com>
	 <359a92830702211602k5214be53r81a81d776feec362@mail.gmail.com>
	 <bd818c7b0702241929l31c262e9l891bfeb486f94efc@mail.gmail.com>
	 <359a92830702250705g3bef1f46rf8b20a957f772193@mail.gmail.com>

------=_Part_6705_18089274.1173543627743
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

BTW Erick this works brilliantly with UN_TOKENIZED.  SUPER fast :)

On 2/25/07, Erick Erickson <erickerickson@gmail.com> wrote:
>
> Yes, I'm pretty sure you have to index the field (UN_TOKENIZED) to be able
> to fetch it with TermDocs/TermEnum! The loop I posted works like this....
>
> for each term in the index for the field
>     if  this is one I want to update
>          use a TermDocs to get to that document and operate on it.
>
>
> But this is actually pretty silly. Your loop uses a better approach,
> except
> you're not using TermDocs correctly. Try
>
>      TermDocs tDocs = new IndexReader.TermDocs()
>      for (Business biz : updates)
>        {
>            Term t = new Term("id", biz.getId());
>            tDocs.seek(t);
>            while (tDocs.next())
>            {
>                Document doc = reader.document(tDocs.doc());
>            }
>        }
>
> But TermDocs/TermEnum is looking at terms in the index. If you haven't
> indexed the term, you won't find it, so your Field.Index.NO is really
> hurting you here.
>
> Best
> Erick
>
> On 2/24/07, no spam <mrs.nospam@gmail.com> wrote:
> >
> > I didn't fully understand your last post and why I wanted to do
> > IndexReader.terms() then IndexReader.termDocs().  Won't something like
> > this
> > work?
> >
> >         for (Business biz : updates)
> >         {
> >             Term t = new Term("id", biz.getId()+"");
> >             TermDocs tDocs = reader.termDocs(t);
> >
> >             while (tDocs.next())
> >             {
> >                 Document doc = reader.document(tDocs.doc());
> >             }
> >         }
> >
> > But tDocs never contains any docs.   Is this because I've indexed my pk
> > like
> > this:
> >
> > doc.add(new Field("id", biz.getId(), Field.Store.YES, Field.Index.NO));
> >
> > instead of
> >
> > doc.add(new Field("id", biz.getId(), Field.Store.YES,
> > Field.Index.UNTOKENIZED));
> >
> > Mark
> >
> > On 2/21/07, Erick Erickson <erickerickson@gmail.com> wrote:
> > >
> > > I think you can get MUCH better efficiency by using TermEnum/TermDocs.
> > But
> > > I
> > > think you need to index (UN_TOKENIZED) your primary key (although now
> > I'm
> > > not sure. But I'd be surprised if TermEnum worked with un-indexed
> data.
> > > Still, it'd be worth trying but I've always assumed that TermEnums
> only
> > > worked on indexed fields....).....
> > >
> > > Anyway, your loop looks more like this...
> > >
> > > TermEnum terms = IndexReader.terms(new Term("primarykey", ""));
> > > TermDocs tDocs = IndexRreader.termDocs();
> > >
> > > while (terms.next()) {
> > >    if (docsToUpdate.contains(terms.text()) {
> > >        tDocs.seek(terms.term());
> > >        writer.updateDocument(tDocs.doc());
> > >    }
> > > }
> > >
> > > NOTE: I've been fast and loose with edge conditions, like insuring
> that
> > > while (terms.next()) doesn't skip the first term, so caveat emptor....
> > > This
> > > loop also assumes that there is one and only one document in your
> index
> > > with
> > > the primary key. Otherwise, you have to do some more work with the
> > > TermDocs
> > > class to process each document that has your primary key...
> > >
> > > This is similar to creating Lucene filters, which is very fast....
> > >
> > > Hope this helps
> > > Erick
> > >
> > >
> > >
> > >
> >
>

------=_Part_6705_18089274.1173543627743--