Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 40467 invoked from network); 20 Oct 2010 20:22:05 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Oct 2010 20:22:05 -0000 Received: (qmail 41940 invoked by uid 500); 20 Oct 2010 20:22:02 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 41895 invoked by uid 500); 20 Oct 2010 20:22:02 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 41887 invoked by uid 99); 20 Oct 2010 20:22:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Oct 2010 20:22:02 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.9] (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 20 Oct 2010 20:22:02 +0000 Received: (qmail 40413 invoked by uid 99); 20 Oct 2010 20:21:42 -0000 Received: from localhost.apache.org (HELO [10.0.0.77]) (127.0.0.1) (smtp-auth username gsingers, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Oct 2010 20:21:42 +0000 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1081) Subject: Re: Using a TermFreqVector to get counts of all words in a document From: Grant Ingersoll In-Reply-To: <006e01cb7088$05fc38f0$11f4aad0$@pipex.com> Date: Wed, 20 Oct 2010 16:20:13 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <2A9868DB-6AF6-4387-9BA7-8D0CD373B5D7@apache.org> References: <006001cb7083$d25abde0$771039a0$@pipex.com> <013c01cb7086$245b8fa0$6d12aee0$@thetaphi.de> <006e01cb7088$05fc38f0$11f4aad0$@pipex.com> To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.1081) On Oct 20, 2010, at 2:53 PM, Martin O'Shea wrote: > Uwe >=20 > Thanks - I figured that bit out. I'm a Lucene 'newbie'. >=20 > What I would like to know though is if it is practical to search a = single > document of one field simply by doing this: >=20 > IndexReader trd =3D IndexReader.open(index); > TermFreqVector tfv =3D trd.getTermFreqVector(docId, "title"); > String[] terms =3D tfv.getTerms(); > int[] freqs =3D tfv.getTermFrequencies(); > for (int i =3D 0; i < tfv.getTerms().length; i++) { > System.out.println("Term " + terms[i] + " Freq: " + = freqs[i]); > } > trd.close(); >=20 > where docId is set to 0. >=20 > The code works but can this be improved upon at all? >=20 > My situation is where I don't want to calculate the number of = documents with > a particular string. Rather I want to get counts of individual words = in a > field in a document. So I can concatenate the strings before passing = it to > Lucene. Can you describe the bigger problem you are trying to solve? This looks = like a classic XY problem: http://people.apache.org/~hossman/#xyproblem What you are doing above will work OK for what you describe (up to the = "passing it to Lucene" part), but you probably should explore the use of = the TermVectorMapper which provides a callback mechanism (similar to a = SAX parser) that will allow you to build your data structures on the fly = instead of having to serialize them into two parallel arrays and then = loop over those arrays to create some other structure. -------------------------- Grant Ingersoll http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org