lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Newton <mich...@mavnn.co.uk>
Subject Re: Weighting results by 'freshness'
Date Fri, 05 May 2017 10:27:37 GMT
Thanks for the reply Itamar (and Matt and Alex!),
I did actually discover the FunctionQuery and added an "OrdFieldSource"; it
ended up looking something like this (in F#):

    let mostRecent =
        Queries.Function.ValueSources.OrdFieldSource("created")
        |> Queries.Function.FunctionQuery
    mostRecent.Boost <- 0.5f
    let parser =
        QueryParsers.Classic.MultiFieldQueryParser
            (Util.LuceneVersion.LUCENE_48, [| "title"; "content" |],
             context.Analyzer)
    let parsedQuery = parser.Parse phrase
    parsedQuery.Boost <- 10.f
    let query = BooleanQuery()
    query.Add(parsedQuery, BooleanClause.Occur.MUST)
    query.Add(mostRecent, BooleanClause.Occur.MUST)

My only issue so far is that multiterm queries sometimes mess with the
respective weightings, meaning that "freshness" can end up having a very
variable effect on the ordering depending on the query supplied. I'm
willing to live with this as that mostly seems to mean that more specific
queries are less weighed by freshness, which feels reasonable.

Now I'm just having a play with multi-word synonyms, which is proving
interesting; overall I'm hugely impressed with the library so far and I'm
really enjoying working on it, so many thanks for all your work!

Michael

On Fri, 5 May 2017 at 11:19 Itamar Syn-Hershko <itamar@code972.com> wrote:

> 2 notes on this:
>
> 1. You can use the date as a tie breaker as a second or third sort order.
> See for instance
> http://stackoverflow.com/questions/34035405/lucene-sort-by-score-and-then-modified-date
>
> 2. In 4.8 there is something new called FunctionQuery - if you store dates
> as numerics you get to execute a function on the score and document values
> during search. I will try to find some time to add a working example of
> this to https://github.com/synhershko/LuceneNetDemo
>
> HTH
>
> --
>
> Itamar Syn-Hershko
> Freelance Developer & Consultant
> Elasticsearch Partner
> Microsoft MVP | Lucene.NET PMC
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> http://BigDataBoutique.co.il/
>
> On Thu, Apr 6, 2017 at 1:26 AM, <alex.davidson@bluewire-technologies.com>
> wrote:
>
>> Hi Michael,
>>
>> I’d do it by storing the datestamp as a numeric field at the desired
>> resolution, and then writing a scorer which calculates ‘age’ of a document
>> (‘now’ minus the datestamp) and *reducing* the document’s score (multiply
>> by weighting less than 1) based on that. Something like (x/(x+age))^n
>> should work, with x and n adjustable for the desired falloff
>> characteristics.
>>
>> Mapping DateTimeOffset->integer in a stable fashion is usually awkward
>> because future calendar changes can break stored data, but since we’re only
>> dealing with past dates here it should be simple. You could probably get
>> away with using Ticks, which is at least evenly distributed across past
>> times.
>>
>> Alex Davidson
>> Bluewire Technologies
>>
>> From: Michael Newton
>> Sent: 05 April 2017 18:39
>> To: user@lucenenet.apache.org
>> Subject: Weighting results by 'freshness'
>>
>> Hi,
>> I've managed to build a nice little Lucene.net back end with 4.0.8 and the
>> free text side of things is working really well. However, I'd like to add
>> a
>> weighting to the queries supplied that means that newer documents are
>> considered more highly (but not to outright sort by date).
>>
>> I have a "created" field on all of my documents, which has been populated
>> using "DateTools.DateToString" with a resolution of day, but I'm uncertain
>> how to add a query which add weight to results based on "the higher the
>> value of field 'x', the better".
>>
>> What would be the best way to go about this?
>>
>> My current query code (in F#) looks like this:
>>
>>     let parser =
>>         QueryParsers.Classic.MultiFieldQueryParser
>>             (Util.LuceneVersion.LUCENE_48, [| "title"; "content" |],
>>              context.Analyzer)
>>     let query = parser.Parse phrase
>>
>> Many thanks,
>>
>> Michael
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message