lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From danield <danield...@gmail.com>
Subject Re: Similarity formula documentation is misleading + how to make field-agnostic queries?
Date Tue, 20 Jan 2015 00:18:42 GMT
Update: I have implemented my own subclasses of QueryParser, BooleanQuery,
BooleanScorer and Similarity to deal with this.

I have been successful in getting the exact behaviour I want... when
calling the .explain() method. However, the scores for some documents often
differ when calling IndexSearcher.search() vs IndexSearcher.explain().

I am a bit confused by this. The coord() seems to be one of the things I
need to change, but is not the only element in the formula that I have
clearly changed for the .explain() pipeline but not for .search().

The implementation of BulkScorer remains perplexing to me and I suspect it
is something in there I have missed. Any pointers?

Thanks!
Daniel


On 15 January 2015 at 23:00, Jack Krupansky-3 [via Lucene] <
ml-node+s472066n4179925h74@n3.nabble.com> wrote:

> File a Jira for this particular doc fix since it is significant and not
> just mere worksmithing. Better yet, submit a patch since that's Javadoc,
> although the exact form of the doc fix might be debatable, so I general
> description of the problem should be sufficient, unless you feel
> motivated.
>
> -- Jack Krupansky
>
> On Thu, Jan 15, 2015 at 11:23 AM, danield <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4179925&i=0>> wrote:
>
> > Hi Mike,
> >
> > Thank you for your reply. Yes, I had thought of this, but it is not a
> > solution to my problem, and this is because the Term Frequency and
> > therefore
> > the results will still be wrong, as prepending or appending a string to
> the
> > term will still make it a different term.
> >
> > Similarily, I could use regex queries, but again that doesn't fix the TF
> > issue. I am not talking here hypothetically, I have proof this doesn't
> work
> > experimentally (i.e. the precision for my task goes down in my
> > experiments).
> >
> > Also, I agree that when your fields are essentially different as in
> > /title/,
> > /author /and /text/, normalizing by field length makes sense, but in my
> > case
> > my fields are many and are all chunks of a larger text (extracted
> sentences
> > that have been labelled with a number of different classes), and in the
> > experiments I am running I am trying to establish whether weighting
> > sentences in different classes differently will lead to increased
> relevance
> > of results.
> >
> > This also doesn't change the fact that documentation is wrong! Any ideas
> > how
> > to fix?
> > Daniel
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Similarity-formula-documentation-is-misleading-how-to-make-field-agnostic-queries-tp4179307p4179834.html
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4179925&i=1>
> > For additional commands, e-mail: [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4179925&i=2>
> >
> >
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Similarity-formula-documentation-is-misleading-how-to-make-field-agnostic-queries-tp4179307p4179925.html
>  To unsubscribe from Similarity formula documentation is misleading + how
> to make field-agnostic queries?, click here
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4179307&code=ZGFuaWVsZHVtYUBnbWFpbC5jb218NDE3OTMwN3wxMjkzMjkwMDg3>
> .
> NAML
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://lucene.472066.n3.nabble.com/Similarity-formula-documentation-is-misleading-how-to-make-field-agnostic-queries-tp4179307p4180529.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message