lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From henok sahilu <henok_sah...@yahoo.com>
Subject Re: A question about the relevancy
Date Tue, 28 Jul 2009 11:34:30 GMT
hello there 
is there anyone who can tell me how to set up an XML search engine.
please give an open source written in java
thanks 
henok

--- On Thu, 7/23/09, Erick Erickson <erickerickson@gmail.com> wrote:

From: Erick Erickson <erickerickson@gmail.com>
Subject: Re: A question about the relevancy
To: java-user@lucene.apache.org
Date: Thursday, July 23, 2009, 4:41 PM

Also, see http://wiki.apache.org/lucene-java/ScoresAsPercentages. The
relevancy here is that comparing scores across different queries is fairly
meaningless, even if you *do* know how that score was arrived at...

Best
Erick

On Thu, Jul 23, 2009 at 6:17 PM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Hi Pedro,
>
> Lucene's Explanation will show you all the juicy details:
>
> http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Scorer.html#explain(int)<http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Scorer.html#explain%28int%29>
>
> But with a query like that, I'm not sure if you'll be able to follow
> everything.  Maybe pick a super simple pair of queries instead, and look at
> Explanation for those simple queries to see what's happening and how they
> are being scored.
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> ----- Original Message ----
> > From: "Naranjo, Pedro" <Pedro_Naranjo@stercomm.com>
> > To: java-user@lucene.apache.org
> > Sent: Thursday, July 23, 2009 5:44:34 PM
> > Subject: A question about the relevancy
> >
> > Hi there,
> >
> > I have a question… we have two querys which only different is the fact
> that
> > Query_1 includes phrase queries where Query_2  has the phrase query but
> > converted into a Boolean query.
> >
> > When each query is executed, Query_1 gives a relevancy of 1.0 and Query_2
> gives
> > one of 0.34. The question is why? Wouldn’t it the fact that each phrase
> query is
> > now rewritten as a Boolean query give you a higher ranking as expected
> because
> > it is matching every keyword in any order?
> >
> > Please let me know if you see anything else that escaped me.
> >
> > Sincerely,
> >
> >
> > Pedro Naranjo
> >
> > ***START********************QUERY_1**************************************
> > (+prod.MIC.MIProductName:rotarix^0.01
> > +((((
> >     prod.MI.MITitle:"yyy yyi zzz"
> >     prod.MI.MITitle:"latex allergies allergi"
> >     prod.MI.MITitle:"administration administr"
> >     ()
> >     prod.MI.MITitle:rotarix
> >     ()
> >     prod.MI.MITitle:"individuals individu"
> >     ()
> >     prod.MI.MITitle:"yyy yyi"
> >     prod.MI.MITitle:zzz
> > (
> >     prod.MI.MITitle:"yyy yyi zzz"
> >     prod.MI.MITitle:"administration administr"
> >     prod.MI.MITitle:rotarix
> >     prod.MI.MITitle:"individuals individu"
> >     prod.MI.MITitle:"yyy yyi"
> >     prod.MI.MITitle:zzz))^15.0)
> > ((
> >     prod.MIFC.MIFileContent:"yyy yyi zzz"
> >     prod.MIFC.MIFileContent:"latex allergies allergi"
> >     prod.MIFC.MIFileContent:"administration administr"
> >     ()
> >     prod.MIFC.MIFileContent:rotarix
> >     ()
> >     prod.MIFC.MIFileContent:"individuals individu"
> >     ()
> >     prod.MIFC.MIFileContent:"yyy yyi"
> >     prod.MIFC.MIFileContent:zzz
> > (
> >     prod.MIFC.MIFileContent:"yyy yyi zzz"
> >     prod.MIFC.MIFileContent:"administration administr"
> >     prod.MIFC.MIFileContent:rotarix
> >     prod.MIFC.MIFileContent:"individuals individu"
> >     prod.MIFC.MIFileContent:"yyy yyi"
> >     prod.MIFC.MIFileContent:zzz))^1.5)
> > ((
> >     prod.MIC.MIKeyword:"yyy yyi zzz"
> >     prod.MIC.MIKeyword:"latex allergies allergi"
> >     prod.MIC.MIKeyword:"administration administr"
> >     ()
> >     prod.MIC.MIKeyword:rotarix
> >     ()
> >     prod.MIC.MIKeyword:"individuals individu"
> >     ()
> >     prod.MIC.MIKeyword:"yyy yyi"
> >     prod.MIC.MIKeyword:zzz
> > (
> >     prod.MIC.MIKeyword:"yyy yyi zzz"
> >     prod.MIC.MIKeyword:"administration administr"
> >     prod.MIC.MIKeyword:rotarix
> >     prod.MIC.MIKeyword:"individuals individu"
> >     prod.MIC.MIKeyword:"yyy yyi"
> >     prod.MIC.MIKeyword:zzz))^10.0))))
> > ***END**********************QUERY_1**************************************
> > ***START********************QUERY_2**************************************
> > (+prod.MIC.MIProductName:rotarix^0.01
> > +((((
> >     (+prod.MI.MITitle:yyy +prod.MI.MITitle:yyi +prod.MI.MITitle:zzz)
> >     (+prod.MI.MITitle:latex +prod.MI.MITitle:allergies
> +prod.MI.MITitle:allergi)
> >
> >     (+prod.MI.MITitle:administration +prod.MI.MITitle:administr)
> >     ()
> >     prod.MI.MITitle:rotarix
> >     ()
> >     (+prod.MI.MITitle:individuals +prod.MI.MITitle:individu)
> >     ()
> >     (+prod.MI.MITitle:yyy +prod.MI.MITitle:yyi)
> >     prod.MI.MITitle:zzz
> > (
> >     (+prod.MI.MITitle:yyy +prod.MI.MITitle:yyi +prod.MI.MITitle:zzz)
> >     (+prod.MI.MITitle:administration +prod.MI.MITitle:administr)
> >     prod.MI.MITitle:rotarix
> >     (+prod.MI.MITitle:individuals +prod.MI.MITitle:individu)
> >     (+prod.MI.MITitle:yyy +prod.MI.MITitle:yyi)
> >     prod.MI.MITitle:zzz))^15.0)
> > ((
> >     (+prod.MIFC.MIFileContent:yyy +prod.MIFC.MIFileContent:yyi
> > +prod.MIFC.MIFileContent:zzz)
> >     (+prod.MIFC.MIFileContent:latex +prod.MIFC.MIFileContent:allergies
> > +prod.MIFC.MIFileContent:allergi)
> >     (+prod.MIFC.MIFileContent:administration
> +prod.MIFC.MIFileContent:administr)
> >
> >     ()
> >     prod.MIFC.MIFileContent:rotarix
> >     ()
> >     (+prod.MIFC.MIFileContent:individuals
> +prod.MIFC.MIFileContent:individu)
> >     ()
> >     (+prod.MIFC.MIFileContent:yyy +prod.MIFC.MIFileContent:yyi)
> >     prod.MIFC.MIFileContent:zzz
> > (
> >     (+prod.MIFC.MIFileContent:yyy +prod.MIFC.MIFileContent:yyi
> > +prod.MIFC.MIFileContent:zzz)
> >     (+prod.MIFC.MIFileContent:administration
> +prod.MIFC.MIFileContent:administr)
> >
> >     prod.MIFC.MIFileContent:rotarix
> >     (+prod.MIFC.MIFileContent:individuals
> +prod.MIFC.MIFileContent:individu)
> >     (+prod.MIFC.MIFileContent:yyy +prod.MIFC.MIFileContent:yyi)
> >     prod.MIFC.MIFileContent:zzz))^1.5)
> > ((
> >     (+prod.MIC.MIKeyword:yyy +prod.MIC.MIKeyword:yyi
> +prod.MIC.MIKeyword:zzz)
> >     (+prod.MIC.MIKeyword:latex +prod.MIC.MIKeyword:allergies
> > +prod.MIC.MIKeyword:allergi)
> >     (+prod.MIC.MIKeyword:administration +prod.MIC.MIKeyword:administr)
> >     ()
> >     prod.MIC.MIKeyword:rotarix
> >     ()
> >     (+prod.MIC.MIKeyword:individuals +prod.MIC.MIKeyword:individu)
> >     ()
> >     (+prod.MIC.MIKeyword:yyy +prod.MIC.MIKeyword:yyi)
> >     prod.MIC.MIKeyword:zzz
> > (
> >     (+prod.MIC.MIKeyword:yyy +prod.MIC.MIKeyword:yyi
> +prod.MIC.MIKeyword:zzz)
> >     (+prod.MIC.MIKeyword:administration +prod.MIC.MIKeyword:administr)
> >     prod.MIC.MIKeyword:rotarix
> >     (+prod.MIC.MIKeyword:individuals +prod.MIC.MIKeyword:individu)
> >     (+prod.MIC.MIKeyword:yyy +prod.MIC.MIKeyword:yyi)
> >     prod.MIC.MIKeyword:zzz))^10.0))))
> > ***END**********************QUERY_2**************************************
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message