lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Indexing synonyms for multiple words
Date Tue, 03 Mar 2009 15:40:21 GMT

Actually, the start position of each token is stored in the "normal"
Lucene index (in the *.prx files), not using payloads.

Payloads are entirely for per-token extensibility (ie, core Lucene
doesn't use them by default): you'd have to create your own analyzer
to attach payloads to tokens, and then do something with them at
search time.

So I suggested you could store the end position of each token into the
Payload, but then you'd need to implement a Query class to use this
during searching.

Mike

Sumukh wrote:

>
> Thanks for your suggestion Michael and thanks to Uwe for clarifying.
>
> Payload is currently used to store only the start positions.
> What I gathered from your suggestion is that we could possibly
> store the end position, or span, or some other complex
> encoding in order to store the extra information.
> Am I right?
>
> --Sumukh
>
>
> Michael McCandless-2 wrote:
>>
>>
>> Since Lucene doesn't represent/store end position for a token, I  
>> don't
>> think the index can properly represent SYN spanning two positions?
>>
>> I suppose you could encode this into payloads, and create a custom
>> query that would look at the payload to enforce the constraint.
>>
>> Or, if you switch to doing SYN expansion only at runtime (not adding
>> it to the index), that might work.
>>
>> Mike
>>
>> Uwe Schindler wrote:
>>
>>> I think his problem is, that "SYN" is a synonym for the phrase  
>>> "WORD1
>>> WORD2". Using these positions, a phrase like "SYN WORD2" would also
>>> match
>>> (or other problems in queries that depend on order of words).
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>>> -----Original Message-----
>>>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>>>> Sent: Monday, March 02, 2009 4:07 PM
>>>> To: java-user@lucene.apache.org
>>>> Subject: Re: Indexing synonyms for multiple words
>>>>
>>>>
>>>> Shouldn't WORD2's position be 1 more than your SYN?
>>>>
>>>> Ie, don't you want these positions?:
>>>>
>>>>   WORD1  2
>>>>   WORD2  3
>>>>   SYN 2
>>>>
>>>> The position is the starting position of the token; Lucene doesn't
>>>> store an ending position
>>>>
>>>> Mike
>>>>
>>>> Sumukh wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm fairly new to Lucene. I'd like to know how we can index  
>>>>> synonyms
>>>>> for
>>>>> multiple words.
>>>>>
>>>>> This is the scenario:
>>>>>
>>>>> Consider a sentence: AAA BBB WORD1 WORD2 EEE FFF GGG.
>>>>>
>>>>> Now assume the two words combined WORD1 WORD2 can be replaced by
>>>>> another
>>>>> word SYN.
>>>>>
>>>>> If I place SYN after WORD1 with positionIncrement set to 0, WORD2
>>>>> will
>>>>> follow SYN,
>>>>> which is incorrect; and the other way round if I place it after
>>>>> WORD2.
>>>>>
>>>>> If any of you have solved a similar problem, I'd be thankful if  
>>>>> you
>>>>> could
>>>>> share some light on
>>>>> the solution.
>>>>>
>>>>> Regards,
>>>>> Sumukh
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Indexing-synonyms-for-multiple-words-tp22289069p22300656.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message