Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 89015 invoked from network); 16 Dec 2006 21:21:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 Dec 2006 21:21:00 -0000 Received: (qmail 68605 invoked by uid 500); 16 Dec 2006 21:21:02 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 68569 invoked by uid 500); 16 Dec 2006 21:21:02 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 68558 invoked by uid 99); 16 Dec 2006 21:21:01 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Dec 2006 13:21:01 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of erickerickson@gmail.com designates 64.233.182.189 as permitted sender) Received: from [64.233.182.189] (HELO nf-out-0910.google.com) (64.233.182.189) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Dec 2006 13:20:51 -0800 Received: by nf-out-0910.google.com with SMTP id n28so224259nfc for ; Sat, 16 Dec 2006 13:20:30 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=lXp1QSKk1uYPStWksKsSVjb8KS0xe3SqZWVchGTFBb/Nip/YbKnQ5vruTZKyk85o8mpGdoKvDAX96adcB7/zAmUpbxq375umA9i25PnGTswLvkHuVga6RmtvStHxy9i6kQWDbRByC0iIFQnIuGqkXbh7OjpTBdPu7ZgopEfexro= Received: by 10.82.118.2 with SMTP id q2mr305682buc.1166304029711; Sat, 16 Dec 2006 13:20:29 -0800 (PST) Received: by 10.82.162.20 with HTTP; Sat, 16 Dec 2006 13:20:29 -0800 (PST) Message-ID: <359a92830612161320q272ca592r4013cce354b58ddc@mail.gmail.com> Date: Sat, 16 Dec 2006 16:20:29 -0500 From: "Erick Erickson" To: java-user@lucene.apache.org Subject: Re: sorting by per doc hit count In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_193411_8182044.1166304029659" References: X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_193411_8182044.1166304029659 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Well, if you're not interested in doing much in the way of complex queries, you could use TermDocs/TermEnum (particularly look at TermDocs) to count the number of times a term appears in each document. I think you'll be surprised at how quickly you can get this info. Making your own scorer seems like a reasonable approach also, you could just return the term frequency (see TermDocs) although I admit I'm staying away from messing with the relevancy scorers so I"m not speaking from experience. Others who know more are going to have to weigh in on functionquery. Warning: I've just been in some (non Lucene) code that tried to do it's own arbitrarily complex boolean logic by counting term frequency. Don't go there if you want to keep your work minimal. In fact, I'd recommend against going there at all . If you can restrict the allowed syntax to be simple AND you'd be all set (simple OR would be ok too). I suspect that as soon as you start even combining the two allowing grouping, the effort increases dramatically. Which probably argues for making your own scorer that just deals with frequency. Best Erick On 12/16/06, Mark Miller < markrmiller@gmail.com> wrote: > > I have not really looked into this yet, but maybe you can save me some > time > -- Is it feasible/simple to sort by the number of hits found per document? > Would this require changing the scoring system (remove idf etc etc) and > doing a normal relevancy search? Could it be done with functionquery? Any > Hints? If it is a lot of work I am not interested in doing it, but if it > is > somewhat simple it would make a few customers feel fuzzy. > > Thanks, > > Mark > > ------=_Part_193411_8182044.1166304029659--