lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "no spam" <mrs.nos...@gmail.com>
Subject Re: updating index
Date Sat, 10 Mar 2007 16:20:27 GMT
BTW Erick this works brilliantly with UN_TOKENIZED.  SUPER fast :)

On 2/25/07, Erick Erickson <erickerickson@gmail.com> wrote:
>
> Yes, I'm pretty sure you have to index the field (UN_TOKENIZED) to be able
> to fetch it with TermDocs/TermEnum! The loop I posted works like this....
>
> for each term in the index for the field
>     if  this is one I want to update
>          use a TermDocs to get to that document and operate on it.
>
>
> But this is actually pretty silly. Your loop uses a better approach,
> except
> you're not using TermDocs correctly. Try
>
>      TermDocs tDocs = new IndexReader.TermDocs()
>      for (Business biz : updates)
>        {
>            Term t = new Term("id", biz.getId());
>            tDocs.seek(t);
>            while (tDocs.next())
>            {
>                Document doc = reader.document(tDocs.doc());
>            }
>        }
>
> But TermDocs/TermEnum is looking at terms in the index. If you haven't
> indexed the term, you won't find it, so your Field.Index.NO is really
> hurting you here.
>
> Best
> Erick
>
> On 2/24/07, no spam <mrs.nospam@gmail.com> wrote:
> >
> > I didn't fully understand your last post and why I wanted to do
> > IndexReader.terms() then IndexReader.termDocs().  Won't something like
> > this
> > work?
> >
> >         for (Business biz : updates)
> >         {
> >             Term t = new Term("id", biz.getId()+"");
> >             TermDocs tDocs = reader.termDocs(t);
> >
> >             while (tDocs.next())
> >             {
> >                 Document doc = reader.document(tDocs.doc());
> >             }
> >         }
> >
> > But tDocs never contains any docs.   Is this because I've indexed my pk
> > like
> > this:
> >
> > doc.add(new Field("id", biz.getId(), Field.Store.YES, Field.Index.NO));
> >
> > instead of
> >
> > doc.add(new Field("id", biz.getId(), Field.Store.YES,
> > Field.Index.UNTOKENIZED));
> >
> > Mark
> >
> > On 2/21/07, Erick Erickson <erickerickson@gmail.com> wrote:
> > >
> > > I think you can get MUCH better efficiency by using TermEnum/TermDocs.
> > But
> > > I
> > > think you need to index (UN_TOKENIZED) your primary key (although now
> > I'm
> > > not sure. But I'd be surprised if TermEnum worked with un-indexed
> data.
> > > Still, it'd be worth trying but I've always assumed that TermEnums
> only
> > > worked on indexed fields....).....
> > >
> > > Anyway, your loop looks more like this...
> > >
> > > TermEnum terms = IndexReader.terms(new Term("primarykey", ""));
> > > TermDocs tDocs = IndexRreader.termDocs();
> > >
> > > while (terms.next()) {
> > >    if (docsToUpdate.contains(terms.text()) {
> > >        tDocs.seek(terms.term());
> > >        writer.updateDocument(tDocs.doc());
> > >    }
> > > }
> > >
> > > NOTE: I've been fast and loose with edge conditions, like insuring
> that
> > > while (terms.next()) doesn't skip the first term, so caveat emptor....
> > > This
> > > loop also assumes that there is one and only one document in your
> index
> > > with
> > > the primary key. Otherwise, you have to do some more work with the
> > > TermDocs
> > > class to process each document that has your primary key...
> > >
> > > This is similar to creating Lucene filters, which is very fast....
> > >
> > > Hope this helps
> > > Erick
> > >
> > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message