lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: posting list strings
Date Mon, 15 Jul 2013 02:48:10 GMT
Is there a Trie-based term index? Seems like this would be smaller, and 
very fast on non-leading wildcards.

On 07/09/2013 02:34 PM, Uwe Schindler wrote:
> Hi,
>
> You can replace the term by their hash directly in the analyzer chain. Just write a custom
TermToBytesRef attribute that hashes the term to a constant-length byte[] (using a AttributeFactory)!
:-) This would give you all features of hashed, constant length terms, but you would lose
prefix and wildcard queries. In fact, NumericTokenStream is doing this for numeric!
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Adrien Grand [mailto:jpountz@gmail.com]
>> Sent: Tuesday, July 09, 2013 11:25 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: posting list strings
>>
>> Hi,
>>
>> Lucene stores the string because it may need it to run prefix or range
>> queries. We don't have a hash-based terms dictionary right now but I know
>> some people wrote one since they don't need support for these queries, see
>> for instance the Earlybird paper[1]. Then if you can find a perfect hashing
>> function, you can just replace your terms by their hash.
>>
>> [1]
>> http://www.umiacs.umd.edu/~jimmylin/publications/Busch_etal_ICDE2012.
>> pdf
>>
>> --
>> Adrien
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message