lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "no spam" <mrs.nos...@gmail.com>
Subject Re: updating index
Date Sun, 25 Feb 2007 03:29:00 GMT
I didn't fully understand your last post and why I wanted to do
IndexReader.terms() then IndexReader.termDocs().  Won't something like this
work?

        for (Business biz : updates)
        {
            Term t = new Term("id", biz.getId()+"");
            TermDocs tDocs = reader.termDocs(t);

            while (tDocs.next())
            {
                Document doc = reader.document(tDocs.doc());
            }
        }

But tDocs never contains any docs.   Is this because I've indexed my pk like
this:

 doc.add(new Field("id", biz.getId(), Field.Store.YES, Field.Index.NO));

instead of

 doc.add(new Field("id", biz.getId(), Field.Store.YES,
Field.Index.UNTOKENIZED));

Mark

On 2/21/07, Erick Erickson <erickerickson@gmail.com> wrote:
>
> I think you can get MUCH better efficiency by using TermEnum/TermDocs. But
> I
> think you need to index (UN_TOKENIZED) your primary key (although now I'm
> not sure. But I'd be surprised if TermEnum worked with un-indexed data.
> Still, it'd be worth trying but I've always assumed that TermEnums only
> worked on indexed fields....).....
>
> Anyway, your loop looks more like this...
>
> TermEnum terms = IndexReader.terms(new Term("primarykey", ""));
> TermDocs tDocs = IndexRreader.termDocs();
>
> while (terms.next()) {
>    if (docsToUpdate.contains(terms.text()) {
>        tDocs.seek(terms.term());
>        writer.updateDocument(tDocs.doc());
>    }
> }
>
> NOTE: I've been fast and loose with edge conditions, like insuring that
> while (terms.next()) doesn't skip the first term, so caveat emptor....
> This
> loop also assumes that there is one and only one document in your index
> with
> the primary key. Otherwise, you have to do some more work with the
> TermDocs
> class to process each document that has your primary key...
>
> This is similar to creating Lucene filters, which is very fast....
>
> Hope this helps
> Erick
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message