From java-user-return-52531-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Mon Apr 9 05:49:07 2012 Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2542D9FC7 for ; Mon, 9 Apr 2012 05:49:07 +0000 (UTC) Received: (qmail 31471 invoked by uid 500); 9 Apr 2012 05:49:05 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 31182 invoked by uid 500); 9 Apr 2012 05:48:59 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 31140 invoked by uid 99); 9 Apr 2012 05:48:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Apr 2012 05:48:58 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tmcao@me.com designates 17.158.161.0 as permitted sender) Received: from [17.158.161.0] (HELO nk11p00mm-asmtp001.mac.com) (17.158.161.0) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Apr 2012 05:48:51 +0000 MIME-version: 1.0 Content-type: multipart/alternative; boundary="Boundary_(ID_MmM5iBCxCk0I+8Q8GzEa1g)" Received: from nk11p00mm-spool004.mac.com ([17.158.161.119]) by nk11p00mm-asmtp001.mac.com (Oracle Communications Messaging Server 7u4-23.01(7.0.4.23.0) 64bit (built Aug 10 2011)) with ESMTP id <0M2700FBY6STRG00@nk11p00mm-asmtp001.mac.com> for java-user@lucene.apache.org; Mon, 09 Apr 2012 05:48:30 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.6.7498,1.0.260,0.0.0000 definitions=2012-04-09_02:2012-04-05,2012-04-09,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 ipscore=0 suspectscore=1 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=6.0.2-1012030000 definitions=main-1204080411 Received: from localhost ([17.158.42.223]) by nk11p00mm-spool004.mac.com (Oracle Communications Messaging Server 7u4-23.01(7.0.4.23.0) 64bit (built Aug 10 2011)) with ESMTP id <0M2700DX96ST8QB0@nk11p00mm-spool004.mac.com> for java-user@lucene.apache.org; Mon, 09 Apr 2012 05:48:29 +0000 (GMT) To: java-user@lucene.apache.org From: Tri Cao Subject: Re: custom scoring Date: Mon, 09 Apr 2012 05:49:40 +0000 (GMT) X-Mailer: MobileMe Mail (1I09.7849) X-Originating-IP: [108.222.64.192] Message-id: <8b692d8d-fe8c-e9e7-833e-ce5e56c6e744@me.com> In-reply-to: X-Virus-Checked: Checked by ClamAV on apache.org --Boundary_(ID_MmM5iBCxCk0I+8Q8GzEa1g) Content-type: text/plain; CHARSET=US-ASCII; format=flowed Content-transfer-encoding: 7BIT Hi, After reading through the IndexSearcher code, it seems I have to do the following: - implement a custom Collector to collect not just the doc IDs and score, but the fields I care about as well - extend ScoreDoc to hold the extra fields - when I get back a TopDocs from a search() call, I can go through the TopDocs and apply the constraints I need to I think this will work, but have some concern about performance. What would you think? Thanks, Tri. On Apr 06, 2012, at 10:06 AM, Tri Cao wrote: Hi all, What would be the best approach for a custom scoring that requires a "global" view of the result set. For example, I have a field call "color" and I would like to have constraints that there are at most 3 docs with color:red, 4 docs with color:blue in the first 16 hits. And the items should still be sorted in by their relevance scores after the constraints are applied. Thanks, Tri. --Boundary_(ID_MmM5iBCxCk0I+8Q8GzEa1g) Content-type: multipart/related; boundary="Boundary_(ID_RKSAc2/0iQ/WJhx3zb95Iw)"; type="text/html" --Boundary_(ID_RKSAc2/0iQ/WJhx3zb95Iw) Content-type: text/html; CHARSET=US-ASCII Content-transfer-encoding: quoted-printable
Hi,

After reading through the IndexSearcher = code, it seems I have to do the following:

- impl= ement a custom Collector to collect not just the doc IDs and score, but th= e fields I care about as well
- extend ScoreDoc to hold the extr= a fields
- when I get back a TopDocs from a search() call, I can= go through the TopDocs and apply the constraints I need to

=
I think this will work, but have some concern about performance= . What would you think?

Thanks,
Tri.

On Apr 06, 2012, at 10:06 AM, Tri Cao <tmcao@me.com> wro= te:

<= div>Hi= all,

Wh= at would be the best approach for a custom scoring that requires a "global= " view of the result set. For example, I have a field call "color" and I w= ould like to have constraints that there are at most 3 docs with color:red= , 4 docs with color:blue in the first 16 hits. And the items should still be sorted in by their relevance s= cores after the constraints are applied.

= Thanks,

Tri.
= --Boundary_(ID_RKSAc2/0iQ/WJhx3zb95Iw)-- --Boundary_(ID_MmM5iBCxCk0I+8Q8GzEa1g)--