lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sumukh <sumukh.gho...@gmail.com>
Subject Re: Indexing synonyms for multiple words
Date Tue, 03 Mar 2009 01:13:39 GMT

Thanks for your suggestion Michael and thanks to Uwe for clarifying.

Payload is currently used to store only the start positions. 
What I gathered from your suggestion is that we could possibly 
store the end position, or span, or some other complex 
encoding in order to store the extra information.
Am I right?

--Sumukh


Michael McCandless-2 wrote:
> 
> 
> Since Lucene doesn't represent/store end position for a token, I don't  
> think the index can properly represent SYN spanning two positions?
> 
> I suppose you could encode this into payloads, and create a custom  
> query that would look at the payload to enforce the constraint.
> 
> Or, if you switch to doing SYN expansion only at runtime (not adding  
> it to the index), that might work.
> 
> Mike
> 
> Uwe Schindler wrote:
> 
>> I think his problem is, that "SYN" is a synonym for the phrase "WORD1
>> WORD2". Using these positions, a phrase like "SYN WORD2" would also  
>> match
>> (or other problems in queries that depend on order of words).
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>> -----Original Message-----
>>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>>> Sent: Monday, March 02, 2009 4:07 PM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: Indexing synonyms for multiple words
>>>
>>>
>>> Shouldn't WORD2's position be 1 more than your SYN?
>>>
>>> Ie, don't you want these positions?:
>>>
>>>    WORD1  2
>>>    WORD2  3
>>>    SYN 2
>>>
>>> The position is the starting position of the token; Lucene doesn't
>>> store an ending position
>>>
>>> Mike
>>>
>>> Sumukh wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm fairly new to Lucene. I'd like to know how we can index synonyms
>>>> for
>>>> multiple words.
>>>>
>>>> This is the scenario:
>>>>
>>>> Consider a sentence: AAA BBB WORD1 WORD2 EEE FFF GGG.
>>>>
>>>> Now assume the two words combined WORD1 WORD2 can be replaced by
>>>> another
>>>> word SYN.
>>>>
>>>> If I place SYN after WORD1 with positionIncrement set to 0, WORD2  
>>>> will
>>>> follow SYN,
>>>> which is incorrect; and the other way round if I place it after  
>>>> WORD2.
>>>>
>>>> If any of you have solved a similar problem, I'd be thankful if you
>>>> could
>>>> share some light on
>>>> the solution.
>>>>
>>>> Regards,
>>>> Sumukh
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Indexing-synonyms-for-multiple-words-tp22289069p22300656.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message