lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sriram Sankar <san...@gmail.com>
Subject Re: posting list strings
Date Mon, 15 Jul 2013 03:11:29 GMT
The large majority of terms in my index are not text terms.  For example, I
have connection terms.  Suppose user 543 and user 664 are connected.  Then
the doc corresponding to user 543 will have a term connection:664 indexed.
 It is not useful to do prefix matching on this - and ideally I'd not want
to pay the overhead for that.

How exactly is prefix matching done on Lucene?  Is there any doc describing
details?

Thanks,

Sriram.



On Sun, Jul 14, 2013 at 7:48 PM, Lance Norskog <goksron@gmail.com> wrote:

> Is there a Trie-based term index? Seems like this would be smaller, and
> very fast on non-leading wildcards.
>
>
> On 07/09/2013 02:34 PM, Uwe Schindler wrote:
>
>> Hi,
>>
>> You can replace the term by their hash directly in the analyzer chain.
>> Just write a custom TermToBytesRef attribute that hashes the term to a
>> constant-length byte[] (using a AttributeFactory)! :-) This would give you
>> all features of hashed, constant length terms, but you would lose prefix
>> and wildcard queries. In fact, NumericTokenStream is doing this for numeric!
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>>  -----Original Message-----
>>> From: Adrien Grand [mailto:jpountz@gmail.com]
>>> Sent: Tuesday, July 09, 2013 11:25 PM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: posting list strings
>>>
>>> Hi,
>>>
>>> Lucene stores the string because it may need it to run prefix or range
>>> queries. We don't have a hash-based terms dictionary right now but I know
>>> some people wrote one since they don't need support for these queries,
>>> see
>>> for instance the Earlybird paper[1]. Then if you can find a perfect
>>> hashing
>>> function, you can just replace your terms by their hash.
>>>
>>> [1]
>>> http://www.umiacs.umd.edu/~**jimmylin/publications/Busch_**etal_ICDE2012<http://www.umiacs.umd.edu/~jimmylin/publications/Busch_etal_ICDE2012>
>>> .
>>> pdf
>>>
>>> --
>>> Adrien
>>>
>>> ------------------------------**------------------------------**
>>> ---------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
>>> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>>>
>>
>> ------------------------------**------------------------------**---------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
>> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>>
>>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message