lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Keegan <peterlkee...@gmail.com>
Subject Re: PayloadNearQuery and AveragePayloadFunction
Date Thu, 02 Feb 2012 21:39:35 GMT
I don't quite follow what you're doing, but is it possible that your
payloads are not on the desired terms when you indexed them? The first
explanation shows that the matching document contained "luteinizing
hormone" in both fields 'AbstractText' and 'AbstractTitle'. The average
payload value was '3.0', so either both terms had payloads that averaged
3.0 or only one had a payload of 3.0. In the 2nd query, the phrase was
found in both fields again, but no payloads were found (thus the 1.0).
According to your 'scorePayload' method, the first match would return 3
only if semantic=A. But the Similarity class is associated with an
IndexReader, so the same 'semantic' would be used for all queries.

Peter


On Thu, Feb 2, 2012 at 11:57 AM, shyama <shyamasree_saha@yahoo.com> wrote:

> Hi List
> Apologies for such a long message. I have tried to include everything, that
> you might need to know to answer my question.
>
> I am having difficulties understanding how or what AveragePayloadFunction
> is
> doing. Here is my example
>
> Title:Human|9 pineal|5 luteinizing hormone receptors.
> Text:The presence of luteinizing hormone receptors in human|9 pineal|5
> glands from five females and three males, ranging in age from 61-89 yr, was
> examined by in situ hybridization and immunocytochemistry. The results
> demonstrated the presence of these receptors at the mRNA|7 and protein
> levels in all the pineal|5 glands examined. Pineal|5 gland luteinizing
> hormone receptors could potentially be involved in the regulation of
> melatonin|7 synthesis.
>
> 3 is for class A
> 5 is for class B
> 7 is for class C
> 9 is for class D
> These are the payloads stored in the index. But when I search, I use these
> values for encoding term class, and then return 3 for selected class.
>
> I am using WhiteSpaceTokenizer and LowerCaseFilter. In my PayloadSimilarity
> class, I manipulate payload in a way so that, if I am interested in class
> A,
> it will return payload value "x=3" only for terms in class A, I decide term
> class by checking its payload value.
>
> Now, I query for "luteinizing hormone" using PayloadNearQuery with slop of
> 5. First I try with interest in class B and next with interest in class A.
>
> *Result of Class A interest:*
>
> Explain: 10.97332 = (MATCH) sum of:
>  2.5589073 = (MATCH) weight(payloadNear([AbstractText:luteinizing,
> AbstractText:hormone], 5, true) in 5362133), product of:
>    0.68000716 = queryWeight(payloadNear([AbstractText:luteinizing,
> AbstractText:hormone], 5, true)), product of:
>      14.045828 = idf(AbstractText:  luteinizing=15481 hormone=164637)
>      0.048413463 = queryNorm
>    3.7630591 = (MATCH) fieldWeight(AbstractText:payloadNear([luteinizing,
> hormone], 5, true) in 5362133), product of:
>      2.4494898 = PayloadNearQuery, product of:
>        0.8164966 = tf(phraseFreq=0.6666667)
>        *3.0 = AveragePayloadFunction(...)*
>      14.045828 = idf(AbstractText:  luteinizing=15481 hormone=164637)
>      0.109375 = fieldNorm(field=AbstractText, doc=5362133)
>  8.4144125 = (MATCH) weight(payloadNear([ArticleTitle:luteinizing,
> ArticleTitle:hormone], 5, true) in 5362133), product of:
>    0.7332054 = queryWeight(payloadNear([ArticleTitle:luteinizing,
> ArticleTitle:hormone], 5, true)), product of:
>      15.144659 = idf(ArticleTitle:  hormone=86980 luteinizing=9765)
>      0.048413463 = queryNorm
>    11.476201 = (MATCH) fieldWeight(ArticleTitle:payloadNear([luteinizing,
> hormone], 5, true) in 5362133), product of:
>      1.7320508 = PayloadNearQuery, product of:
>        0.57735026 = tf(phraseFreq=0.33333334)
>       * 3.0 = AveragePayloadFunction(...)*
>      15.144659 = idf(ArticleTitle:  hormone=86980 luteinizing=9765)
>      0.4375 = fieldNorm(field=ArticleTitle, doc=5362133)
> ---------------------------------------------------------------------
>
> *Result of Class B Interest:*
>
> Explain: 3.657773 = (MATCH) sum of:
>  0.85296905 = (MATCH) weight(payloadNear([AbstractText:luteinizing,
> AbstractText:hormone], 5, true) in 5362133), product of:
>    0.68000716 = queryWeight(payloadNear([AbstractText:luteinizing,
> AbstractText:hormone], 5, true)), product of:
>      14.045828 = idf(AbstractText:  luteinizing=15481 hormone=164637)
>      0.048413463 = queryNorm
>    1.254353 = (MATCH) fieldWeight(AbstractText:payloadNear([luteinizing,
> hormone], 5, true) in 5362133), product of:
>      0.8164966 = PayloadNearQuery, product of:
>        0.8164966 = tf(phraseFreq=0.6666667)
>        *1.0 = AveragePayloadFunction(...)*
>      14.045828 = idf(AbstractText:  luteinizing=15481 hormone=164637)
>      0.109375 = fieldNorm(field=AbstractText, doc=5362133)
>  2.804804 = (MATCH) weight(payloadNear([ArticleTitle:luteinizing,
> ArticleTitle:hormone], 5, true) in 5362133), product of:
>    0.7332054 = queryWeight(payloadNear([ArticleTitle:luteinizing,
> ArticleTitle:hormone], 5, true)), product of:
>      15.144659 = idf(ArticleTitle:  hormone=86980 luteinizing=9765)
>      0.048413463 = queryNorm
>    3.8254004 = (MATCH) fieldWeight(ArticleTitle:payloadNear([luteinizing,
> hormone], 5, true) in 5362133), product of:
>      0.57735026 = PayloadNearQuery, product of:
>        0.57735026 = tf(phraseFreq=0.33333334)
>       * 1.0 = AveragePayloadFunction(...)*
>      15.144659 = idf(ArticleTitle:  hormone=86980 luteinizing=9765)
>      0.4375 = fieldNorm(field=ArticleTitle, doc=5362133)
>
> As I understand, when I am interested in class B, I should get 3 from
> AveragePayloadFunction, where as I should get 1 for class A, as there is no
> class A term in the text, hence everything will have payload 1. Whereas, if
> I am interested in Class B, there is one term in "Title" field, hence
> AveragePayloadFunction returned value will be 3.
>
> I do not understand what is going on. May be I am not getting what
> AveragePayloadFunction is doing exactly.
>
> My similarity class is as follows:
>
> public class PayloadSearchSimilarity extends DefaultSimilarity {
>
>        private static final long serialVersionUID = 1L;
>        public static String semantic;
>
>        @Override
>    public float scorePayload(int docId,String fieldName, int start, int
> end, byte[] bytes, int offset, int length) {
>                //System.out.println("this is gett");
>                if(bytes!=null)
>                {
>                float payload=PayloadHelper.decodeFloat(bytes, offset);
>                //System.out.println("this is getting called,
> load:"+payload);
>                        //i am now returning same payload for all semantic
> type so that we can
> compare the score. it was changed after we showed it to Dietrich.
>                        if(semantic.equals("A") && (payload==3))
>                        {
>                                //System.out.println("Doc id:"+docId+"field
> :"+fieldName+" Semantic:"+
> semantic+" Payload:"+payload);
>                                return 3;
>                        }
>                        else
>                        {
>                                if(semantic.equals("B") && (payload==5))
>                                {
>                                        //System.out.println("Doc
> id:"+docId+"field :"+fieldName+" Semantic:"+
> semantic+" Payload:"+payload);
>                                        return 3;
>                                }
>                                else
>                                {
>                                        if(semantic.equals("C") &&
> (payload==7))
>                                        {
>
>  System.out.println("Semantic:"+ semantic);
>                                                return 3;
>                                        }
>                                        else
>                                        {
>
>                                                if(semantic.equals("D") &&
> (payload==9))
>                                                {
>
>  System.out.println("Semantic:"+ semantic);
>                                                        return 3;
>                                                }
>                                                else
>                                                {
>
>  //System.out.println("happens when term class does not match with
> semantic, Semantic:"+ semantic);
>                                                        return 1;
>                                                }
>                                        }
>                                }
>                        }
>
>        }//payload|bytes not null end
>        else
>        {
>                //System.out.println("payload null");
>                return 1;
>        }
>    }
> }
>
> I am really puzzled. It will be really helpful, if someone can help.
>
> Look forward to hear from you.
> Many Thanks
> Shyama
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/PayloadNearQuery-and-AveragePayloadFunction-tp3710454p3710454.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message