lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: similarity function
Date Thu, 05 Mar 2009 20:05:16 GMT
Hi Seid,

Do you have a reference for the article?  I've done some QA in my day,  
but don't recall reading that one.

At any rate, I do think it is possible to do what you are after.  See  

On Mar 5, 2009, at 9:49 AM, Seid Mohammed wrote:

> For my work, I have read an article stating that " Answer type can be
> automatically constructed by Indexing Different Questions and Answer
> types. Later, when an unseen question apears, answer type for this
> question will be found with the help of 'similarity function'
> computation"
> so I am clear with the arguement above. my problem is,
> 1. how can I index individual questions and Answer types as is ( not  
> tokenized

I'm not sure you want this, but when constructing your Field, just use  
the NOT_ANALYZED option.

> 2. how can I calculate the similarity between indexed questions and
> and unseen questions (question of any type that can be asked latter)

In line with #1, I think you might be better off to actually tokenize  
the question as one one field, and the answer type as a second field.   
Then, you can let Lucene calculate similarity via it's normal query  
mechanisms.  In this case, I would like try experimenting with things  
like: exact match, phrase queries with slop, etc.  That way, not only  
can you match "Who is the president of UN" but you might also match on  
things that are a bit fuzzier.  To do this, you might need to have  
several fields per document with variations.  I could also see using  
Lucene's payload mechanism as well.

But, as Vasu said, you will likely need other parts too, like OpenNLP.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message