Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 78088 invoked from network); 16 Dec 2009 17:27:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Dec 2009 17:27:41 -0000 Received: (qmail 11095 invoked by uid 500); 16 Dec 2009 17:27:40 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 11010 invoked by uid 500); 16 Dec 2009 17:27:39 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 11000 invoked by uid 99); 16 Dec 2009 17:27:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Dec 2009 17:27:39 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com designates 209.85.160.46 as permitted sender) Received: from [209.85.160.46] (HELO mail-pw0-f46.google.com) (209.85.160.46) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Dec 2009 17:27:31 +0000 Received: by pwj16 with SMTP id 16so856777pwj.5 for ; Wed, 16 Dec 2009 09:27:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=GRYCQy00e+VK4oDfUMXHYAYBwOpZhsz08/t1nae2chQ=; b=r5GTMyq717OHbK/VeoseB+qY8A1VGUm1laJHc1dDwbtRe+CRuTqc2o+8Xjiel1G/xP 2zxqcEbhcldwJQMKarTJ3PjKub/7kWh6SSR+oXdlRIF/u68lVDIchXCZ+9onwuacB8yC biVogByvuvHLAAy+nI6bbbkLzt8YfT08lgTPU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=RaqQwKalk8Z0ucQjr76TtrE0imKSe+AdyPDBrpYOamy55MxwrAqvHdydgyJjn0N4QQ HYW6NU4B/VvZ/P/LVJwokXdosv5sQmuQHmS7DyvgOFU01yEoomdwkoocPTuH6FQT4N6s mtPvtWCAteKgICqZkr62Lq8/u26ume/1vVWcs= MIME-Version: 1.0 Received: by 10.114.86.5 with SMTP id j5mr897055wab.0.1260984430154; Wed, 16 Dec 2009 09:27:10 -0800 (PST) In-Reply-To: <8120c3fa0912160734h48fde66aw24156439503d282a@mail.gmail.com> References: <8120c3fa0912160734h48fde66aw24156439503d282a@mail.gmail.com> From: Ted Dunning Date: Wed, 16 Dec 2009 09:26:50 -0800 Message-ID: Subject: Re: Frequency Term of Composite words To: general@lucene.apache.org Content-Type: multipart/alternative; boundary=00504502e13b06d584047adbd1c7 X-Virus-Checked: Checked by ClamAV on apache.org --00504502e13b06d584047adbd1c7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable You need the term frequency vector. See here http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/IndexReader= .html#getTermFreqVector%28int,%20java.lang.String%29 This is compatible in 3.0 as well: http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/index/IndexR= eader.html#getTermFreqVector%28int,%20java.lang.String%29 Note the package change. On Wed, Dec 16, 2009 at 7:34 AM, Antonio Cal=C3=B2 w= rote: > I All > > I Hope that you can help me on this. > > I'm looking for a fast way to obtainf for a given word, its term frequenc= y > (I mean how many times it is available in a single doc). I've looking int= o > mail archive and LIA (Lucene In Action) book and I found something like > this: > > IndexSearcher index =3D new IndexSearcher(invertedIndexinRam); > Term term =3D new Term("doc", "quick"); > int occurrence =3D index.docFreq(term); > > ok, occurrence contains the occurrences of the word "quick" into the inde= x > (In my case the index will contain only one document example "the quick > brown fox jumps over the lazy dog"). In this case the occurrence will be = 1. > :) > > But now I need to retrieve the occurrency of a composite word: as example > "quick brown fox" but I'm quite in trouble on how could I perform this. > > Thanks in advance for your help. > > Best Regards. > > Antonio > > > > -- > Antonio Cal=C3=B2 > ------------------------------------------ > Software Developer Engineer > @ Intellisemantic > Mail anton.calo@gmail.com > Tel. 011-56.90.429 > ------------------------------------------ > --=20 Ted Dunning, CTO DeepDyve --00504502e13b06d584047adbd1c7--