lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Scores between words. Boosting?
Date Wed, 18 Mar 2009 02:27:42 GMT

On Mar 17, 2009, at 5:44 AM, liat oren wrote:

> Thanks for all the answers.
> I am new to Lucene and in the emails its the first time I heard of the
> bigrams and thus read about them a bit.
> Question - if I query for "cat animal" - or use boosting - "cat^2
> animal^0.5" - will the results return ONLY documents that contain  
> both?
> From what I saw until now - it can also show documents that contain  
> one of
> them, no?

I think if you are using bigrams, then you would only match on one,  
but if you do the prefix/wildard approach you could match on either.   
I'm not sure if you will be able to pull off doing the individual term  
boosting and the bigrams.  You will likely need to write your own  
Query classes to do that.

If you don't mind me asking, what is the problem you are trying to  
solve?  I know the solution you want (I think, namely boosted bigrams  
of some sort), but I'm still clueless on the problem and I think that  
is really hindering me helping.  It sounds like it is some type of co- 
occurrence problem, but I'm not sure.  Is there a bigger category that  
what you are doing fits in?  If you can't say, that is fine, too.  It  
may be some proprietary thing.

> Can you please elaborate a bit more on your suggestion?
> I read a bit on the synonyms and the wordNet package.
> Isn't there a way to use an index that is structured in the same way  
> the
> index of the wordNet (any idea how is this index built?), but stores  
> other
> values?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message