Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of mamoabeng@vjoon.com designates
 212.60.17.50 as permitted sender)
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_2BBC47F9-28D6-4976-B949-70EE9C76CCE8"
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Subject: Re: What is the best way to aggregate scores for sets of documents?
From: Manuel Amoabeng <mamoabeng@vjoon.com>
In-Reply-To: <527B90BE.6080900@gmail.com>
Date: Thu, 7 Nov 2013 14:17:36 +0100
Cc: java-user@lucene.apache.org
Message-Id: <B561D133-B767-4028-95B7-85969C071DE1@vjoon.com>
References: <373BEBAD-4106-46BF-8A6F-C9407B26881E@vjoon.com>
 <527B90BE.6080900@gmail.com>
To: Alan Burlison <alan.burlison@gmail.com>

--Apple-Mail=_2BBC47F9-28D6-4976-B949-70EE9C76CCE8
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=iso-8859-1

Sounds good, but wouldn't  the aggregated scores of documents consisting =
of many sub-documents potentially be greater than the scores of docs =
with very few sub-documents even if the overall content is equal?=20
 =20
Thanks,

Manuel

On 07.11.2013, at 14:08, Alan Burlison <alan.burlison@gmail.com> wrote:

> On 07/11/2013 10:59, Manuel Amoabeng wrote:
>=20
>> Is there are a way to aggregate the scores for logically connected
>> ScoreDocs so that the result would be similar to the score a single
>> document containing all matched content would have gotten?
>=20
> I did something similar by just post-processing the query results, =
grouping by the upper-level construct and adding up all the scores for =
the sub-documents, then sorting by aggregated score. Crude, but gives =
good relevancy in the results.
>=20
> --=20
> Alan Burlison
> --
>=20


--Apple-Mail=_2BBC47F9-28D6-4976-B949-70EE9C76CCE8--