lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: Per-field collators
Date Mon, 22 Oct 2007 17:27:29 GMT

On Oct 22, 2007, at 10:09 AM, Marvin Humphrey wrote:

> The conclusion I reached was that you needed to have a dedicated  
> TermEnum for each field, implying individual term dictionary files  
> (.tis, .tii).

I realized that I needed to explain this.

If KS allows users to supply Perl sort subs as collators, the cost  
per comparison will be high.  This doesn't scale well for large  
result sets.

One solution is to move the sorting cost to index-time for individual  
fields.  Since KS has global field semantics, it's possible to  
associate a collator with a field name, and sort terms within the  
term dictionary by it.  However, using multiple collators within the  
same term dictionary is messy, because it's difficult to decide which  
one you should be using at any given point during a scan.  Using a  
dedicated TermEnum for each field cleans that up.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message