lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: move TrieRange* to core?
Date Wed, 18 Mar 2009 20:33:48 GMT
I have no problem with it! Thanks!

What I would like to be fixed before moving it to core is the fact that a
additional helper field is needed for the trie values. If everything could
be in one field and the field is still sortable, it would be fine. For that,
the order of terms in the FieldCache should be fixed. As current trie fields
of highest precision order before all other lower precision field, the
simpliest fix would be to only index the first first term from TermEnum at
the documents index in the FieldCache.

Another way would be to just invert the order and let the higher precision
fields appear at last in the TermEnum. Both would be possible, but there
should be a clear statement, which term for multi-term-fields is put into
FieldCache (maybe configureable). See LUCENE-1372 for that.

If all terms could be in one field, the API to TrieRange could be simplier
and more effective for the GC. The trieCodeLong/Int() method would just
return a TokenStream that can be indexed using "new
Field(Name,TokenStream)", more effectively using the Token's char buffer
during trie encoding (it could be reused). This is how it is done by Solr at
the moment (but with the additional allocation of the array) - I do not like
the array allocations for each term and the whole trie-encoding at the
moment (1x char[], 1x String[], additional copying,...).

I would be happy to have it in core, I could prepare the patch, when the
above is fixed!

As names: NumberUtils, IntRangeFilter, LongRangeFilter is fine,
AbstractNumberRangeFilter is internal only (just to have less code
duplication, like StringBuffer and StringBuilder in JDK, both coming from a
internal superclass invisible to outside)

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Wednesday, March 18, 2009 9:02 PM
> To: java-dev@lucene.apache.org
> Subject: move TrieRange* to core?
> 
> I think we should move TrieRange* into core before 2.9?
> 
> It's received alot of attention, from both developers (Uwe & Yonik did
> lots of iterations, and Solr is folding it in) and user interest.
> 
> It's a simpler & more scalable way to index numeric fields that you
> intend to sort and/or do range querying on; we can do away with tricky
> number padding.
> 
> Plus it's just plain cool :)
> 
> I also think we should change its name.  I know and love "trie", but
> it's a very technical term that's not immediately meaningful to users
> of Lucene's API.  Plus I've learned from doing too many renamings
> lately that it's best to try to get the name right at the start.
> 
> Maybe just NumberUtils, IntRangeFilter, LongRangeFilter,
> AbstractNumberRangeFilter?
> 
> Thoughts?
> 
> Mike
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message