Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 93722 invoked from network); 7 Feb 2007 15:28:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 Feb 2007 15:28:12 -0000 Received: (qmail 82449 invoked by uid 500); 7 Feb 2007 15:28:06 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 82279 invoked by uid 500); 7 Feb 2007 15:28:04 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 82246 invoked by uid 99); 7 Feb 2007 15:28:04 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Feb 2007 07:28:04 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of csahat@gmail.com designates 64.233.182.187 as permitted sender) Received: from [64.233.182.187] (HELO nf-out-0910.google.com) (64.233.182.187) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Feb 2007 07:27:54 -0800 Received: by nf-out-0910.google.com with SMTP id i2so488900nfe for ; Wed, 07 Feb 2007 07:27:33 -0800 (PST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=aEcKi96UwlkOhV1Knk2n40z9Tn8OiQ8X3uCPvhMYpcq1JduQs5lSiY6ZTRWImNStyXpZE8uKiG+cZlI5Op5rf5IdX9gvRFi35KElixXoxE4PgcYkDrPfYC7J7rvHc/0741f5DERgfqHVzQMTLyVdYPCntlYDUVJoxdAXelVRc3I= Received: by 10.82.138.6 with SMTP id l6mr753363bud.1170862045060; Wed, 07 Feb 2007 07:27:25 -0800 (PST) Received: by 10.82.174.13 with HTTP; Wed, 7 Feb 2007 07:27:24 -0800 (PST) Message-ID: <2e64e87b0702070727t1d00268h31ea6bc3534a7d3d@mail.gmail.com> Date: Wed, 7 Feb 2007 16:27:24 +0100 From: csahat To: java-user@lucene.apache.org Subject: Re: Counting term frequency without using Explanation In-Reply-To: <359a92830702070648x62d9e221w22061d22d4171297@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_7276_19229948.1170862044566" References: <2e64e87b0702070626o2f597044ud8f1f10b869eac27@mail.gmail.com> <359a92830702070648x62d9e221w22061d22d4171297@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_7276_19229948.1170862044566 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi Erick, Thanks for your help. I found it now . Maybe I will donate an article / FAQ regarding this... since many newbies like me asking for things like this. On 2/7/07, Erick Erickson wrote: > > Before you go too far down this path, please consider what a "hit" is. > It's > more complicated than you think . > > If all you want to do is count up the number of times any term appears in > the document, it's not too hard. You should be able to use a > termenum/termdocs process to count them. > > TermDocs should work, just seek to a term, skip to the document number > (which you'll have to get somewhere else), and keep adding to your count > while the docid is the same as your target. Repeat for each term. > > But it's a much more complicated story if you want to accurately reflect a > query. For instance, consider a near query, that is terms within, say, 3 > of > each other. If you do something like the above, you'll present "hits" that > aren't real. For instance... > > a b c d e f g h i j a > > if you search for a and c within 3 of each other, is this one hit? two? it > definitely isn't three which is what you'd get if you just counted the > occurrence of the terms a, b... What about a NOT clause? How does a phrase > query get counted? > > There have been several discussions of various aspects of this issue, but > often in the context of highlighting. You'll probably get some good > information from the following threads... > > Counting terms' hits from phrases > Counting hits in a document > > as well as searching the archive on highlighting and/or hitcount > > Best > Erick > > > > > On 2/7/07, csahat wrote: > > > > Hi all, > > > > I'm so sorry if this question already answered before in this list, > but > > I > > already search > > the list, and I couldn't find the answer. > > > > This is what I want to do : > > > > When the user type in the query, for example "WebSphere Java", > > Lucene will show not only the score, but showing the term count per > > document > > as well, like this > > > > doc1 0.8333 websphere=3, Java = 2 > > doc2 0.817 websphere=2, Java=2 > > > > > > I already tried to implement with TermFreqVector, but TermFreqVector > > show > > all the > > terms in the field, instead what I want is only the terms that happen in > > the > > query. > > I already tried using TermDocs as well, but it always gave result 0. > > > > I tried using Explanation class, using toString method, but I have to > > "clean" > > the information. > > > > > > Is there any "direct" way to do this in Lucene ? Or perhaps someone > can > > give me a hint ? > > > > Thanks in advance > > > ------=_Part_7276_19229948.1170862044566--