lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: updating index
Date Sun, 25 Feb 2007 15:05:21 GMT
Yes, I'm pretty sure you have to index the field (UN_TOKENIZED) to be able
to fetch it with TermDocs/TermEnum! The loop I posted works like this....

for each term in the index for the field
    if  this is one I want to update
         use a TermDocs to get to that document and operate on it.


But this is actually pretty silly. Your loop uses a better approach, except
you're not using TermDocs correctly. Try

     TermDocs tDocs = new IndexReader.TermDocs()
     for (Business biz : updates)
       {
           Term t = new Term("id", biz.getId());
           tDocs.seek(t);
           while (tDocs.next())
           {
               Document doc = reader.document(tDocs.doc());
           }
       }

But TermDocs/TermEnum is looking at terms in the index. If you haven't
indexed the term, you won't find it, so your Field.Index.NO is really
hurting you here.

Best
Erick

On 2/24/07, no spam <mrs.nospam@gmail.com> wrote:
>
> I didn't fully understand your last post and why I wanted to do
> IndexReader.terms() then IndexReader.termDocs().  Won't something like
> this
> work?
>
>         for (Business biz : updates)
>         {
>             Term t = new Term("id", biz.getId()+"");
>             TermDocs tDocs = reader.termDocs(t);
>
>             while (tDocs.next())
>             {
>                 Document doc = reader.document(tDocs.doc());
>             }
>         }
>
> But tDocs never contains any docs.   Is this because I've indexed my pk
> like
> this:
>
> doc.add(new Field("id", biz.getId(), Field.Store.YES, Field.Index.NO));
>
> instead of
>
> doc.add(new Field("id", biz.getId(), Field.Store.YES,
> Field.Index.UNTOKENIZED));
>
> Mark
>
> On 2/21/07, Erick Erickson <erickerickson@gmail.com> wrote:
> >
> > I think you can get MUCH better efficiency by using TermEnum/TermDocs.
> But
> > I
> > think you need to index (UN_TOKENIZED) your primary key (although now
> I'm
> > not sure. But I'd be surprised if TermEnum worked with un-indexed data.
> > Still, it'd be worth trying but I've always assumed that TermEnums only
> > worked on indexed fields....).....
> >
> > Anyway, your loop looks more like this...
> >
> > TermEnum terms = IndexReader.terms(new Term("primarykey", ""));
> > TermDocs tDocs = IndexRreader.termDocs();
> >
> > while (terms.next()) {
> >    if (docsToUpdate.contains(terms.text()) {
> >        tDocs.seek(terms.term());
> >        writer.updateDocument(tDocs.doc());
> >    }
> > }
> >
> > NOTE: I've been fast and loose with edge conditions, like insuring that
> > while (terms.next()) doesn't skip the first term, so caveat emptor....
> > This
> > loop also assumes that there is one and only one document in your index
> > with
> > the primary key. Otherwise, you have to do some more work with the
> > TermDocs
> > class to process each document that has your primary key...
> >
> > This is similar to creating Lucene filters, which is very fast....
> >
> > Hope this helps
> > Erick
> >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message