lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: A question about the relevancy
Date Tue, 28 Jul 2009 16:28:53 GMT
Please start a new thread when you are asking a question on adifferent
topic. See: http://people.apache.org/~hossman/#threadhijack

<http://people.apache.org/~hossman/#threadhijack>When you open a new topic,
please elaborate a bit more on what
you're trying to accomplish. "An XML search engine" doesn't give
us very much to work with.

Best
Erick


On Tue, Jul 28, 2009 at 7:34 AM, henok sahilu <henok_sahilu@yahoo.com>wrote:

> hello there
> is there anyone who can tell me how to set up an XML search engine.
> please give an open source written in java
> thanks
> henok
>
> --- On Thu, 7/23/09, Erick Erickson <erickerickson@gmail.com> wrote:
>
> From: Erick Erickson <erickerickson@gmail.com>
> Subject: Re: A question about the relevancy
> To: java-user@lucene.apache.org
> Date: Thursday, July 23, 2009, 4:41 PM
>
> Also, see http://wiki.apache.org/lucene-java/ScoresAsPercentages. The
> relevancy here is that comparing scores across different queries is fairly
> meaningless, even if you *do* know how that score was arrived at...
>
> Best
> Erick
>
> On Thu, Jul 23, 2009 at 6:17 PM, Otis Gospodnetic <
> otis_gospodnetic@yahoo.com> wrote:
>
> > Hi Pedro,
> >
> > Lucene's Explanation will show you all the juicy details:
> >
> >
> http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Scorer.html#explain(int)
> <
> http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Scorer.html#explain%28int%29
> >
> >
> > But with a query like that, I'm not sure if you'll be able to follow
> > everything.  Maybe pick a super simple pair of queries instead, and look
> at
> > Explanation for those simple queries to see what's happening and how they
> > are being scored.
> >
> > Otis
> > --
> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >
> >
> >
> > ----- Original Message ----
> > > From: "Naranjo, Pedro" <Pedro_Naranjo@stercomm.com>
> > > To: java-user@lucene.apache.org
> > > Sent: Thursday, July 23, 2009 5:44:34 PM
> > > Subject: A question about the relevancy
> > >
> > > Hi there,
> > >
> > > I have a question… we have two querys which only different is the fact
> > that
> > > Query_1 includes phrase queries where Query_2  has the phrase query but
> > > converted into a Boolean query.
> > >
> > > When each query is executed, Query_1 gives a relevancy of 1.0 and
> Query_2
> > gives
> > > one of 0.34. The question is why? Wouldn’t it the fact that each phrase
> > query is
> > > now rewritten as a Boolean query give you a higher ranking as expected
> > because
> > > it is matching every keyword in any order?
> > >
> > > Please let me know if you see anything else that escaped me.
> > >
> > > Sincerely,
> > >
> > >
> > > Pedro Naranjo
> > >
> > >
> ***START********************QUERY_1**************************************
> > > (+prod.MIC.MIProductName:rotarix^0.01
> > > +((((
> > >     prod.MI.MITitle:"yyy yyi zzz"
> > >     prod.MI.MITitle:"latex allergies allergi"
> > >     prod.MI.MITitle:"administration administr"
> > >     ()
> > >     prod.MI.MITitle:rotarix
> > >     ()
> > >     prod.MI.MITitle:"individuals individu"
> > >     ()
> > >     prod.MI.MITitle:"yyy yyi"
> > >     prod.MI.MITitle:zzz
> > > (
> > >     prod.MI.MITitle:"yyy yyi zzz"
> > >     prod.MI.MITitle:"administration administr"
> > >     prod.MI.MITitle:rotarix
> > >     prod.MI.MITitle:"individuals individu"
> > >     prod.MI.MITitle:"yyy yyi"
> > >     prod.MI.MITitle:zzz))^15.0)
> > > ((
> > >     prod.MIFC.MIFileContent:"yyy yyi zzz"
> > >     prod.MIFC.MIFileContent:"latex allergies allergi"
> > >     prod.MIFC.MIFileContent:"administration administr"
> > >     ()
> > >     prod.MIFC.MIFileContent:rotarix
> > >     ()
> > >     prod.MIFC.MIFileContent:"individuals individu"
> > >     ()
> > >     prod.MIFC.MIFileContent:"yyy yyi"
> > >     prod.MIFC.MIFileContent:zzz
> > > (
> > >     prod.MIFC.MIFileContent:"yyy yyi zzz"
> > >     prod.MIFC.MIFileContent:"administration administr"
> > >     prod.MIFC.MIFileContent:rotarix
> > >     prod.MIFC.MIFileContent:"individuals individu"
> > >     prod.MIFC.MIFileContent:"yyy yyi"
> > >     prod.MIFC.MIFileContent:zzz))^1.5)
> > > ((
> > >     prod.MIC.MIKeyword:"yyy yyi zzz"
> > >     prod.MIC.MIKeyword:"latex allergies allergi"
> > >     prod.MIC.MIKeyword:"administration administr"
> > >     ()
> > >     prod.MIC.MIKeyword:rotarix
> > >     ()
> > >     prod.MIC.MIKeyword:"individuals individu"
> > >     ()
> > >     prod.MIC.MIKeyword:"yyy yyi"
> > >     prod.MIC.MIKeyword:zzz
> > > (
> > >     prod.MIC.MIKeyword:"yyy yyi zzz"
> > >     prod.MIC.MIKeyword:"administration administr"
> > >     prod.MIC.MIKeyword:rotarix
> > >     prod.MIC.MIKeyword:"individuals individu"
> > >     prod.MIC.MIKeyword:"yyy yyi"
> > >     prod.MIC.MIKeyword:zzz))^10.0))))
> > >
> ***END**********************QUERY_1**************************************
> > >
> ***START********************QUERY_2**************************************
> > > (+prod.MIC.MIProductName:rotarix^0.01
> > > +((((
> > >     (+prod.MI.MITitle:yyy +prod.MI.MITitle:yyi +prod.MI.MITitle:zzz)
> > >     (+prod.MI.MITitle:latex +prod.MI.MITitle:allergies
> > +prod.MI.MITitle:allergi)
> > >
> > >     (+prod.MI.MITitle:administration +prod.MI.MITitle:administr)
> > >     ()
> > >     prod.MI.MITitle:rotarix
> > >     ()
> > >     (+prod.MI.MITitle:individuals +prod.MI.MITitle:individu)
> > >     ()
> > >     (+prod.MI.MITitle:yyy +prod.MI.MITitle:yyi)
> > >     prod.MI.MITitle:zzz
> > > (
> > >     (+prod.MI.MITitle:yyy +prod.MI.MITitle:yyi +prod.MI.MITitle:zzz)
> > >     (+prod.MI.MITitle:administration +prod.MI.MITitle:administr)
> > >     prod.MI.MITitle:rotarix
> > >     (+prod.MI.MITitle:individuals +prod.MI.MITitle:individu)
> > >     (+prod.MI.MITitle:yyy +prod.MI.MITitle:yyi)
> > >     prod.MI.MITitle:zzz))^15.0)
> > > ((
> > >     (+prod.MIFC.MIFileContent:yyy +prod.MIFC.MIFileContent:yyi
> > > +prod.MIFC.MIFileContent:zzz)
> > >     (+prod.MIFC.MIFileContent:latex +prod.MIFC.MIFileContent:allergies
> > > +prod.MIFC.MIFileContent:allergi)
> > >     (+prod.MIFC.MIFileContent:administration
> > +prod.MIFC.MIFileContent:administr)
> > >
> > >     ()
> > >     prod.MIFC.MIFileContent:rotarix
> > >     ()
> > >     (+prod.MIFC.MIFileContent:individuals
> > +prod.MIFC.MIFileContent:individu)
> > >     ()
> > >     (+prod.MIFC.MIFileContent:yyy +prod.MIFC.MIFileContent:yyi)
> > >     prod.MIFC.MIFileContent:zzz
> > > (
> > >     (+prod.MIFC.MIFileContent:yyy +prod.MIFC.MIFileContent:yyi
> > > +prod.MIFC.MIFileContent:zzz)
> > >     (+prod.MIFC.MIFileContent:administration
> > +prod.MIFC.MIFileContent:administr)
> > >
> > >     prod.MIFC.MIFileContent:rotarix
> > >     (+prod.MIFC.MIFileContent:individuals
> > +prod.MIFC.MIFileContent:individu)
> > >     (+prod.MIFC.MIFileContent:yyy +prod.MIFC.MIFileContent:yyi)
> > >     prod.MIFC.MIFileContent:zzz))^1.5)
> > > ((
> > >     (+prod.MIC.MIKeyword:yyy +prod.MIC.MIKeyword:yyi
> > +prod.MIC.MIKeyword:zzz)
> > >     (+prod.MIC.MIKeyword:latex +prod.MIC.MIKeyword:allergies
> > > +prod.MIC.MIKeyword:allergi)
> > >     (+prod.MIC.MIKeyword:administration +prod.MIC.MIKeyword:administr)
> > >     ()
> > >     prod.MIC.MIKeyword:rotarix
> > >     ()
> > >     (+prod.MIC.MIKeyword:individuals +prod.MIC.MIKeyword:individu)
> > >     ()
> > >     (+prod.MIC.MIKeyword:yyy +prod.MIC.MIKeyword:yyi)
> > >     prod.MIC.MIKeyword:zzz
> > > (
> > >     (+prod.MIC.MIKeyword:yyy +prod.MIC.MIKeyword:yyi
> > +prod.MIC.MIKeyword:zzz)
> > >     (+prod.MIC.MIKeyword:administration +prod.MIC.MIKeyword:administr)
> > >     prod.MIC.MIKeyword:rotarix
> > >     (+prod.MIC.MIKeyword:individuals +prod.MIC.MIKeyword:individu)
> > >     (+prod.MIC.MIKeyword:yyy +prod.MIC.MIKeyword:yyi)
> > >     prod.MIC.MIKeyword:zzz))^10.0))))
> > >
> ***END**********************QUERY_2**************************************
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message