lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Balmain" <>
Subject Re: Global field semantics
Date Mon, 10 Jul 2006 17:39:17 GMT
On 7/11/06, Chuck Williams <> wrote:
> David Balmain wrote on 07/10/2006 01:04 AM:
> > The only problem I could find with this solution is that
> > fields are no longer in alphabetical order in the term dictionary but
> > I couldn't think of a use-case where this is necessary although I'm
> > sure there probably is one.
> So presumably fields are still contiguous, you keep a pointer to where
> each field starts, and terms within the field remain in alphabetical order?

Actually yes, that is how I did it although I'm not sure it's the best
way now. I was hoping that by having a pointer to the start of each
field there would be some good perfomance gains in searching but it
turned out not to be the case. You really only save a couple of
iterations in the getIndexOffset method.

To make things easier though, you can just leave the
TermInfosWriter/Reader almost as they are. The only difference though
is that you store field numbers in the index rather than field names
and when you compare terms while scanning the index, you also compare
field numbers rather than field names.

I don't know if I've described it very well but I hope that makes sense.



PS. By the way, I don't know if I made this clear but the 5x speed up
I was talking about comes during indexing. The performance improvement
as far as search is concerned wasn't what I had hoped. It is a little
faster but the bottle neck really comes from reading the documents
from the index. So to alleviate that I've added lazy field loading
which seems to work well. Actually, I've set it up so that I can read
excerpts from fields without even loading the whole field so
highlighting is super fast.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message