lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <ysee...@gmail.com>
Subject Re: scalability w/ number of fields
Date Wed, 06 Apr 2005 16:59:37 GMT
Thanks Doug, your previous comment led us to consider compound field
types of the form compound:"name=value".  Open ended range queries
also need some manipulation for this scheme to work.

> Yes, this is an ugly hack, but it can make a huge performance
> differrence.  The problem is that Lucene stores norm values in an array,
> when, in cases like yours, a sparse data structure might be more sensible.

Ahhh, "There's a norm file for each indexed field with a byte for each
document."
This obviously impacts segment merging... what about query performance? 

A question regarding stored-only fields (and having say 10,000 of
those)... I notice that stored and indexed field names are both listed
in the .fnm segment file.  Are there any performance critical places
in Lucene where this list is walked linearly?  I assume it's loaded
into an array in memory so access by fieldnum is O(1).  Also makes the
FieldNum VInts bigger, but I don't see that has having a big effect.

I'll try and keep the list informed as we get more numbers (and maybe
try out other things like generic or compound fields).

-Yonik

On Apr 6, 2005 12:28 PM, Doug Cutting <cutting@apache.org> wrote:
> Yonik Seeley wrote:
> > They are all indexed (and they all need to be under the current design).
> 
> As I mentioned before, Lucene will not perform well with a large number
> of indexed fields.  If these are not tokenized fields, then a simple way
> to reduce the number of indexed fields is to move the field name into
> the value.  Instead of adding <fieldX, valueY> and <fieldZ, valueA>, add
> <generic, fieldX-valueY> and <generic, fieldZ-valueY>.  This should
> perform quite well.  You'll also need to manipulate queries accordingly.
> 
> A similar method can work for tokenized fields.  Simply write a
> TokenFilter that appends a field name to the front of tokens.
> 
> Yes, this is an ugly hack, but it can make a huge performance
> differrence.  The problem is that Lucene stores norm values in an array,
> when, in cases like yours, a sparse data structure might be more sensible.
> 
> Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message