lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: TrieRange
Date Sat, 07 Feb 2009 01:08:26 GMT
On Fri, Feb 6, 2009 at 6:18 PM, Uwe Schindler <uwe@thetaphi.de> wrote
> The encoding of the values
> into two different field names does the trick for the whole range query.
> Removing the code that generates the field in exactly that way would remove
> the idea behind TrieRangeFilter.

Allowing the ability to specify the exact name of both fields (or
specify both names as the same)?

> The only thing that could be changed would
> be to make the suffix on the helper field variable. There is lot of
> optimization behind this (see TrieRangeFilter comments in the central
> splitRange method). This depends on the order and so the names of the field
> and its helper.

Are you saying that the actual field names cause a predictable
performance difference?  This should not be the case.

> Very early versions of the trie algorithm was using a extra
> field name for each slice, as you call it, I removed it completely later (no
> helper field at all), but then trie fields could not be sorted any more. The
> extra helper field is the way in the middle.
>
> Maybe there should be more documentation about that by a more native
> speaker.

I understand how it works and how one would need to configure it such
that it be sortable if needed - but my point was really much more
about allowing people to do things differently if needed.

>> For example, whether to encode values in two fields and exactly what
>> those fields are named, seems like it should be under developer
>> control.  Likewise, developers should be in control of creating and
>> adding fields to documents and setting other properties like
>> omitTerms, omitNorms, etc.
>
> In my opinion there is no sense in having norms or such things on trie
> fields. They should only be queried using TrieRangeFilter/Query, for that
> norms and TF are not needed (as they are numerical values, for what do you
> need the norms?).

norms: they also fold in index time boosts.
What if someone wanted to put payloads or use some other future
indexing method on the terms?

It's more about not forcing decisions on all developers.
There would be no way for me to incorporate the Trie stuff into Solr
as it stands - I'd need to develop custom code that duplicated code
from TrieUtils because not enough flexibility is exposed.  I'm not
adverse to doing so - just pointing out the downsides.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message