lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shyama <shyamasree_s...@yahoo.com>
Subject PayloadNearQuery and AveragePayloadFunction
Date Thu, 02 Feb 2012 16:57:00 GMT
Hi List
Apologies for such a long message. I have tried to include everything, that
you might need to know to answer my question. 

I am having difficulties understanding how or what AveragePayloadFunction is
doing. Here is my example

Title:Human|9 pineal|5 luteinizing hormone receptors.
Text:The presence of luteinizing hormone receptors in human|9 pineal|5
glands from five females and three males, ranging in age from 61-89 yr, was
examined by in situ hybridization and immunocytochemistry. The results
demonstrated the presence of these receptors at the mRNA|7 and protein
levels in all the pineal|5 glands examined. Pineal|5 gland luteinizing
hormone receptors could potentially be involved in the regulation of
melatonin|7 synthesis.

3 is for class A
5 is for class B
7 is for class C
9 is for class D
These are the payloads stored in the index. But when I search, I use these
values for encoding term class, and then return 3 for selected class.

I am using WhiteSpaceTokenizer and LowerCaseFilter. In my PayloadSimilarity
class, I manipulate payload in a way so that, if I am interested in class A,
it will return payload value "x=3" only for terms in class A, I decide term
class by checking its payload value. 

Now, I query for "luteinizing hormone" using PayloadNearQuery with slop of
5. First I try with interest in class B and next with interest in class A.

*Result of Class A interest:*

Explain: 10.97332 = (MATCH) sum of:
  2.5589073 = (MATCH) weight(payloadNear([AbstractText:luteinizing,
AbstractText:hormone], 5, true) in 5362133), product of:
    0.68000716 = queryWeight(payloadNear([AbstractText:luteinizing,
AbstractText:hormone], 5, true)), product of:
      14.045828 = idf(AbstractText:  luteinizing=15481 hormone=164637)
      0.048413463 = queryNorm
    3.7630591 = (MATCH) fieldWeight(AbstractText:payloadNear([luteinizing,
hormone], 5, true) in 5362133), product of:
      2.4494898 = PayloadNearQuery, product of:
        0.8164966 = tf(phraseFreq=0.6666667)
        *3.0 = AveragePayloadFunction(...)*
      14.045828 = idf(AbstractText:  luteinizing=15481 hormone=164637)
      0.109375 = fieldNorm(field=AbstractText, doc=5362133)
  8.4144125 = (MATCH) weight(payloadNear([ArticleTitle:luteinizing,
ArticleTitle:hormone], 5, true) in 5362133), product of:
    0.7332054 = queryWeight(payloadNear([ArticleTitle:luteinizing,
ArticleTitle:hormone], 5, true)), product of:
      15.144659 = idf(ArticleTitle:  hormone=86980 luteinizing=9765)
      0.048413463 = queryNorm
    11.476201 = (MATCH) fieldWeight(ArticleTitle:payloadNear([luteinizing,
hormone], 5, true) in 5362133), product of:
      1.7320508 = PayloadNearQuery, product of:
        0.57735026 = tf(phraseFreq=0.33333334)
       * 3.0 = AveragePayloadFunction(...)*
      15.144659 = idf(ArticleTitle:  hormone=86980 luteinizing=9765)
      0.4375 = fieldNorm(field=ArticleTitle, doc=5362133)
---------------------------------------------------------------------

*Result of Class B Interest:*

Explain: 3.657773 = (MATCH) sum of:
  0.85296905 = (MATCH) weight(payloadNear([AbstractText:luteinizing,
AbstractText:hormone], 5, true) in 5362133), product of:
    0.68000716 = queryWeight(payloadNear([AbstractText:luteinizing,
AbstractText:hormone], 5, true)), product of:
      14.045828 = idf(AbstractText:  luteinizing=15481 hormone=164637)
      0.048413463 = queryNorm
    1.254353 = (MATCH) fieldWeight(AbstractText:payloadNear([luteinizing,
hormone], 5, true) in 5362133), product of:
      0.8164966 = PayloadNearQuery, product of:
        0.8164966 = tf(phraseFreq=0.6666667)
        *1.0 = AveragePayloadFunction(...)*
      14.045828 = idf(AbstractText:  luteinizing=15481 hormone=164637)
      0.109375 = fieldNorm(field=AbstractText, doc=5362133)
  2.804804 = (MATCH) weight(payloadNear([ArticleTitle:luteinizing,
ArticleTitle:hormone], 5, true) in 5362133), product of:
    0.7332054 = queryWeight(payloadNear([ArticleTitle:luteinizing,
ArticleTitle:hormone], 5, true)), product of:
      15.144659 = idf(ArticleTitle:  hormone=86980 luteinizing=9765)
      0.048413463 = queryNorm
    3.8254004 = (MATCH) fieldWeight(ArticleTitle:payloadNear([luteinizing,
hormone], 5, true) in 5362133), product of:
      0.57735026 = PayloadNearQuery, product of:
        0.57735026 = tf(phraseFreq=0.33333334)
       * 1.0 = AveragePayloadFunction(...)*
      15.144659 = idf(ArticleTitle:  hormone=86980 luteinizing=9765)
      0.4375 = fieldNorm(field=ArticleTitle, doc=5362133)

As I understand, when I am interested in class B, I should get 3 from
AveragePayloadFunction, where as I should get 1 for class A, as there is no
class A term in the text, hence everything will have payload 1. Whereas, if
I am interested in Class B, there is one term in "Title" field, hence
AveragePayloadFunction returned value will be 3.

I do not understand what is going on. May be I am not getting what
AveragePayloadFunction is doing exactly. 

My similarity class is as follows:

public class PayloadSearchSimilarity extends DefaultSimilarity {

	private static final long serialVersionUID = 1L;
	public static String semantic;
	
	@Override
    public float scorePayload(int docId,String fieldName, int start, int
end, byte[] bytes, int offset, int length) {
		//System.out.println("this is gett");
		if(bytes!=null)
		{
		float payload=PayloadHelper.decodeFloat(bytes, offset);
		//System.out.println("this is getting called, load:"+payload);
			//i am now returning same payload for all semantic type so that we can
compare the score. it was changed after we showed it to Dietrich.
			if(semantic.equals("A") && (payload==3))
			{
				//System.out.println("Doc id:"+docId+"field :"+fieldName+" Semantic:"+
semantic+" Payload:"+payload);
				return 3;
			}
			else
			{
				if(semantic.equals("B") && (payload==5))
				{
					//System.out.println("Doc id:"+docId+"field :"+fieldName+" Semantic:"+
semantic+" Payload:"+payload);
					return 3;
				}
				else
				{
					if(semantic.equals("C") && (payload==7))
					{
						System.out.println("Semantic:"+ semantic);
						return 3;
					}
					else
					{
						
						if(semantic.equals("D") && (payload==9))
						{
							System.out.println("Semantic:"+ semantic);
							return 3;
						}
						else
						{
							//System.out.println("happens when term class does not match with
semantic, Semantic:"+ semantic);
							return 1;
						}
					}
				}
			}
		
	}//payload|bytes not null end
	else
	{
		//System.out.println("payload null");
		return 1;
	}
    }
}

I am really puzzled. It will be really helpful, if someone can help.

Look forward to hear from you.
Many Thanks
Shyama

--
View this message in context: http://lucene.472066.n3.nabble.com/PayloadNearQuery-and-AveragePayloadFunction-tp3710454p3710454.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message