lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <>
Subject Re: Similarity formula documentation is misleading + how to make field-agnostic queries?
Date Thu, 15 Jan 2015 22:58:23 GMT
File a Jira for this particular doc fix since it is significant and not
just mere worksmithing. Better yet, submit a patch since that's Javadoc,
although the exact form of the doc fix might be debatable, so I general
description of the problem should be sufficient, unless you feel motivated.

-- Jack Krupansky

On Thu, Jan 15, 2015 at 11:23 AM, danield <> wrote:

> Hi Mike,
> Thank you for your reply. Yes, I had thought of this, but it is not a
> solution to my problem, and this is because the Term Frequency and
> therefore
> the results will still be wrong, as prepending or appending a string to the
> term will still make it a different term.
> Similarily, I could use regex queries, but again that doesn't fix the TF
> issue. I am not talking here hypothetically, I have proof this doesn't work
> experimentally (i.e. the precision for my task goes down in my
> experiments).
> Also, I agree that when your fields are essentially different as in
> /title/,
> /author /and /text/, normalizing by field length makes sense, but in my
> case
> my fields are many and are all chunks of a larger text (extracted sentences
> that have been labelled with a number of different classes), and in the
> experiments I am running I am trying to establish whether weighting
> sentences in different classes differently will lead to increased relevance
> of results.
> This also doesn't change the fact that documentation is wrong! Any ideas
> how
> to fix?
> Daniel
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message