lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Re: Lucene 2.9
Date Mon, 09 Mar 2009 13:58:45 GMT

>>Maybe we could do something similar to declare that agiven field uses Trie*, and with
what datatype.

With the current implementation you can at least test for the presence of a field called:

..which tells you some form of trie is used but could be extended to include precision step
value e.g.

But - overall, I can't help but feel we will struggle to offer facilities like this when there
is a lack of a formal schema for Lucene indexes. 
Solr obviously includes some form of index definition - my Lucene-based apps tend to use a
custom config and Luke could certainly benefit from some form of definition stored with the
Time for some standardised index metadata?

This trie/parser issue is an example of a broader issue for me.


----- Original Message ----
From: Michael McCandless <>
Sent: Monday, 9 March, 2009 13:10:32
Subject: Re: Lucene 2.9

Uwe Schindler wrote:

>> Or perhaps we should move Trie* into core Lucene, and then build a
>> real (ootb) integration with QueryParser.
> The problem is that the query parser does not know if a field is encoded as
> trie or is just a normal text token. Furthermore, the new trie API does not
> differentiate between dates, doubles, longs (same for 32bit) because every
> trie field is identical (it is the application's task to keep track on the
> encoding when indexing and searching, TrieRange only supports the conversion
> of these data types to sortable integers), but the "datatype" itself is not
> stored in index. Solr has support for this in its "schema", but for Lucene
> all fields are identical. For the query parser there is no possibility to
> differentiate between a long, double or date.

Could we add APIs to QueryParser so the application can state the disposition
toward certain fields?

EG QueryParser now tries to guess whether a range query's upper/lower bound
should be parsed as dates, and there are methods exposed to set the resolution
on a per-field basis.  Maybe we could do something similar to declare that a
given field uses Trie*, and with what datatype.

Just thinking aloud really... but since we haven't yet released Trie*, now (for 2.9)
is a good time to think hard about how we expose/integrate it... and making it
easier to use ootb seems important.


To unsubscribe, e-mail:
For additional commands, e-mail:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message