lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: Boosting of Join Results
Date Tue, 22 Mar 2016 11:44:06 GMT
what is you nest join into boost eg q=+foo {!boost ..}{!join ... v=...}

see
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BoostQueryParser

if it works, you may vote for
https://issues.apache.org/jira/browse/SOLR-7814

On Tue, Mar 22, 2016 at 12:39 PM, Alena Dengler <
Alena.Dengler@bsb-muenchen.de> wrote:

> Hello,
>
> we are currently developing a combined index for book metadata and
> fulltexts. Our primary core contains metadata of ~12Mio. books. ~0.5Mio.
> of them have fulltexts; those fulltexts are indexed in a secondary core.
> This secondary core has one index document per fulltext page.
> We are joining all matching fulltext pages with the bookwise metadata
> in the primary core. Currently we have the problem that scores for books
> with matches from the secondary core are not comparable with matches
> from metadata only. So we are trying to normalize fulltext scores to be
> in the same dimension as the metadata scores for non-digitized results.
>
> This is a basic query without join using only the primary core
> (metadata):
> http://server/solr/live/select?&q=+geschichte&fl=id,score
> Top 10 result scores range from 2.0 to 1.7
>
> For fulltexts, the query is extended with a join:
>
> http://server/solr/live/select?q=%28%28+geschichte%29%20OR%20_query_:{!join%20from=expandtype%20fromIndex=pages%20to=id%20score=max%20v=%27pageno_content:%28+geschichte%29%27}%29&fl=id,score
> Top 10 result scores range from 5.4 to 4.8 (4.7 score points for the
> first hit result from the joined secondary core. We would like to reduce
> this value. See explain output below [1])
>
> This difference will effectively hide any books without fulltexts from
> hitlists, which is not our goal.
>
> We tried to add lucene boosts to the join subquery, but they do not
> have any effect on the final scores. E.g. we 'down boost' the fulltext
> results by a factor of 0.1:
> q=((+geschichte) OR _query_:{!join from=expandtype fromIndex=pages
> to=id score=max v='pageno_content:(+geschichte)^0.1'})
> But the resulting scores are the same as from the join example above.
>
> Is this the correct query syntax, or should the boost for the join
> query be put somewhere else?
>
> Thanks for any suggestions.
>
> Best Regards
> Alena
>
> [1] Explain output for the first hit of the join example query
> 5.398742 = sum of:
>   4.816505 = sum of:
>     0.07251295 = max of:
>       0.07251295 = weight(title:geschichte in 10585926)
> [ClassicSimilarity], result of:
>         0.07251295 = score(doc=10585926,freq=1.0), product of:
>           0.037440736 = queryWeight, product of:
>             5.1646385 = idf(docFreq=197504, maxDocs=12713278)
>             0.00724944 = queryNorm
>           1.9367394 = fieldWeight in 10585926, product of:
>             1.0 = tf(freq=1.0), with freq of:
>               1.0 = termFreq=1.0
>             5.1646385 = idf(docFreq=197504, maxDocs=12713278)
>             0.375 = fieldNorm(doc=10585926)
>       0.005904072 = weight(free_search:geschichte in 10585926)
> [ClassicSimilarity], result of:
>         0.005904072 = score(doc=10585926,freq=2.0), product of:
>           0.022005465 = queryWeight, product of:
>             3.035471 = idf(docFreq=1660594, maxDocs=12713278)
>             0.00724944 = queryNorm
>           0.26830027 = fieldWeight in 10585926, product of:
>             1.4142135 = tf(freq=2.0), with freq of:
>               2.0 = termFreq=2.0
>             3.035471 = idf(docFreq=1660594, maxDocs=12713278)
>             0.0625 = fieldNorm(doc=10585926)
>     4.743992 = Score based on join value 957245
>   0.58188105 = weight(statusband:F in 10585926) [ClassicSimilarity],
> result of:
>     0.58188105 = score(doc=10585926,freq=1.0), product of:
>       0.4592555 = queryWeight, product of:
>         50.0 = boost
>         1.2670095 = idf(docFreq=9734121, maxDocs=12713278)
>         0.00724944 = queryNorm
>       1.2670095 = fieldWeight in 10585926, product of:
>         1.0 = tf(freq=1.0), with freq of:
>           1.0 = termFreq=1.0
>         1.2670095 = idf(docFreq=9734121, maxDocs=12713278)
>         1.0 = fieldNorm(doc=10585926)
>   3.5596997E-4 =
>
> FunctionQuery(1.0/(3.16E-11*float(ms(const(1458638802405),date(freshness)))+1.0)),
> product of:
>     0.00491031 =
>
> 1.0/(3.16E-11*float(ms(const(1458638802405),date(freshness)=1813-01-01T00:00:01Z))+1.0)
>     0.0724944 = boost
>     1.0 = queryNorm
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhludnev@griddynamics.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message