lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Document ID shuffling under 2.3.x (on merge?)
Date Wed, 12 Mar 2008 13:42:59 GMT
I certainly found that lazy loading changed my speed dramatically, but
that was on a particularly field-heavy index.

I wonder if TermEnum/TermDocs would be fast enough on an indexed
(UN_TOKENIZED???) field for a unique id.

Mostly, I'm hoping you'll try this and tell me if it works so I don't have
to sometime <G>....

Erick

On Tue, Mar 11, 2008 at 9:26 PM, Daniel Noll <daniel@nuix.com> wrote:

> On Wednesday 12 March 2008 09:53:58 Erick Erickson wrote:
> > But to me, it always seems...er...fraught to even *think* about relying
> > on doc ids. I know you've been around the block with Lucene, but do you
> > have a compelling reason to use the doc ID and not your own unique ID?
>
> From memory it was around a factor of 10 times slower to use a text field
> for
> this; I haven't tested it recently and the case of retrieving the Document
> should be slightly faster now that we have FieldSelector, but it certainly
> won't be faster as to get the document you need the ID in the first place.
>
> For single documents it wasn't a problem, the use cases are:
>  1. Bulk database operations based on the matched documents.
>  2. Creating a filter BitSet based on a database query.
>
> Effectively this is required because Lucene offered no way to update a
> Document after it was indexed; if it had that feature we would never have
> needed a database. ;-)
>
> Daniel
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message