lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Scores between words. Boosting?
Date Mon, 09 Mar 2009 14:28:03 GMT
Hmmm, I have some inklings of an idea, but can we take a step back?   
Can you explain the problem you are trying to solve at a higher level  
(instead of the current solution)?  I imagine it is something related  
to co-occurrence analysis.


On Mar 8, 2009, at 8:05 AM, liat oren wrote:

> Hi Grant,
>
> No, you can only have two words - the score is between two words.
>
> "cat dog" and "dog cat" is equivalent, it will actually always be  
> "cat dog"
> - going by alphabetic order.
>
> About the boosting, I read a bit about it - but couldn't find how it  
> can
> help me, unless I change every appearance of the word dog to have  
> also cat
> and animal using the weight of the score.
> So, for example, every word will appear 10 times from what it is -  
> if apple
> appears 1, I will do the boosting so it appears 10 times.
> If dog appears, then it will also have cat twice (0.2*10) and animal 5
> times(0.5*10).
>
> But I hope to have another better solution.
>
>
> Thanks
> 2009/3/8 Grant Ingersoll <gsingers@apache.org>
>
>> Hi Liat,
>>
>> Some questions inline below.
>>
>> On Mar 8, 2009, at 5:49 AM, liat oren wrote:
>>
>> Hi,
>>>
>>> I have scores between words, for example - dog and animal have a  
>>> score of
>>> 0.5 (and not 0), dog and cat have a score of 0.2, etc.
>>> These scores are stored in an index:
>>> Doc1: field words: dog animal
>>>       field score: 0.5
>>> Doc2: field words: dog cat
>>>       field score: 0.2
>>>
>>> If the user searches for the word dog - I would like that  
>>> documents that
>>> contain the word animal or cat will also get a good score (that  
>>> will take
>>> into account the 0.5 and 0.2).
>>>
>>
>> Is it always the case that these come in pairs?  In other words,  
>> would you
>> ever have:
>> field words: dog cat animal
>> score: 0.9
>>
>> Also, is the following equivalent, or would it have a different  
>> score:
>> field words: cat dog
>> score: 0.2
>>
>>
>>
>>>
>>> Basically what I do is: for every document in the database, I loop  
>>> over
>>> the
>>> words that appear in the query (the query is long in a size of an  
>>> article)
>>> and for every word that appears in each document I take the score  
>>> from the
>>> index mentioned above and calculating a score between the query  
>>> and each
>>> document.
>>>
>>> Any suggestion how to do it using Lucene search? How to add these  
>>> values
>>> to
>>> the searcher?
>>>
>>
>> Thinking...
>>
>>
>>>
>>> I looked at the boosting option, but couldn't really see how it  
>>> helps me
>>> to
>>> that matter.
>>>
>>
>> What "boosting option" did you look at?  Can you explain a bit more?
>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
>> using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message