lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Balmain" <dbalmain...@gmail.com>
Subject Re: Global field semantics
Date Mon, 10 Jul 2006 18:00:00 GMT
On 7/11/06, Yonik Seeley <yseeley@gmail.com> wrote:
> On 7/10/06, David Balmain <dbalmain.ml@gmail.com> wrote:
> > I don't think declaring all fields up front is necessary for
> > substantial optimizations. I've found that the key to some really good
> > optimizations is having constant field numbers. That is, once a field
> > is added to the index it is assigned a field number and it it keeps
> > that field number for the life of the index.
>
> I can sort of see how this would work when adding documents to a singe index.
> What about merging indicies via IndexWriter.addIndexes()?  I guess
> this would require keeping the current way of merging around as a
> fallback?

That's right. I still need to work on this. Currently you need to spec
each index before hand to make sure they have the same fields. But
it's just a matter of using the old merge model for adding
heterogenous indexes.

> Does this mess up opening a MultiReader on multiple indicies
> constructed at different times?  This is a common thing for people to
> do.

Same as above. I still need to fix this. I'm yet to release all these
new changes.

> > This allows one
> > FieldInfos object per index instead of one per segment.
>
> So when a new segment is written, the global FieldInfos may need to be updated.
> I guess this should be written after the new segment and before the
> "segments" file.

That's exactly how I do it. I did consider putting it all in the
"segments" file but I decided not to. I can't remember why right now.
So I have a "segments" file and a "fields" file, the "segments" file
being written last.

> >  As I mentioned
> > earlier this greatly optimizes the merging of term vectors and stored
> > fields. The only problem I could find with this solution is that
> > fields are no longer in alphabetical order in the term dictionary but
> > I couldn't think of a use-case where this is necessary although I'm
> > sure there probably is one.
>
> Isn't an ordered term dictionary necessary to do lookups?

Terms are alphabetically sorted, just not the fields. So if you add a
"title" field and then a "content" field they'd have the numbers 0 and
1 respectively. Now if the title field has the terms "alpha" and
"bravo" and the "content" field has the terms "apple" and "banana"
then they'd be ordered like this;

0:alpha
0:bravo
1:apple
1:banana

instead of like this;

content:apple
content:banana
title:alpha
title:bravo

Notice the terms are correctly ordered in both but the fields aren't.

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message