lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Bernstein (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-9193) Add scoreNodes Streaming Expression
Date Tue, 05 Jul 2016 17:37:11 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362833#comment-15362833
] 

Joel Bernstein edited comment on SOLR-9193 at 7/5/16 5:36 PM:
--------------------------------------------------------------

Added a new test using the termFreq param and added some error handling. The link above incorporates
these changes.

This ticket is pretty close to being ready. I'll do some testing at scale and see if this
turns up any issues.


was (Author: joel.bernstein):
Added a new test using the termFreq param and added some error handling. The link above incorporates
these changes.

This ticket is pretty close being ready. I'll do some testing at scale and see if this turns
up any issues.

> Add scoreNodes Streaming Expression
> -----------------------------------
>
>                 Key: SOLR-9193
>                 URL: https://issues.apache.org/jira/browse/SOLR-9193
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>             Fix For: 6.2
>
>         Attachments: SOLR-9193.patch
>
>
> The scoreNodes Streaming Expression is another *GraphExpression*. It will decorate a
gatherNodes expression and us a tf-idf scoring algorithm to score the nodes.
> The gatherNodes expression only gathers nodes and aggregations. This is similar in nature
to tf in search ranking, where the number of times a node appears in the traversal represents
the tf. But this skews recommendations towards nodes that appear frequently in the index.
> Using the idf for each node we can score each node as a function of tf and idf. This
will provide a boost to nodes that appear less frequently in the index. 
> The scoreNodes expression will gather the idf's from the shards for each node emitted
by the underlying gatherNodes expression. It will then assign the score to each node. 
> The computed score will be added to each node in the *nodeScore* field. The docFreq of
the node across the entire collection will be added to each node in the *docFreq* field. Other
streaming expressions can then perform a ranking based on the nodeScore or compute their own
score using the nodeFreq.
> proposed syntax:
> {code}
> top(n="10",
>       sort="nodeScore desc",
>       scoreNodes(gatherNodes(...))) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message