Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1081)
Subject: Re: Using a TermFreqVector to get counts of all words in a document
From: Grant Ingersoll <gsingers@apache.org>
In-Reply-To: <006e01cb7088$05fc38f0$11f4aad0$@pipex.com>
Date: Wed, 20 Oct 2010 16:20:13 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <2A9868DB-6AF6-4387-9BA7-8D0CD373B5D7@apache.org>
References: <006001cb7083$d25abde0$771039a0$@pipex.com>
 <013c01cb7086$245b8fa0$6d12aee0$@thetaphi.de>
 <006e01cb7088$05fc38f0$11f4aad0$@pipex.com>
To: java-user@lucene.apache.org


On Oct 20, 2010, at 2:53 PM, Martin O'Shea wrote:

> Uwe
>=20
> Thanks - I figured that bit out. I'm a Lucene 'newbie'.
>=20
> What I would like to know though is if it is practical to search a =
single
> document of one field simply by doing this:
>=20
> IndexReader trd =3D IndexReader.open(index);
>        TermFreqVector tfv =3D trd.getTermFreqVector(docId, "title");
>        String[] terms =3D tfv.getTerms();
>        int[] freqs =3D tfv.getTermFrequencies();
>        for (int i =3D 0; i < tfv.getTerms().length; i++) {
>            System.out.println("Term " + terms[i] + " Freq: " + =
freqs[i]);
>        }
>        trd.close();
>=20
> where docId is set to 0.
>=20
> The code works but can this be improved upon at all?
>=20
> My situation is where I don't want to calculate the number of =
documents with
> a particular string. Rather I want to get counts of individual words =
in a
> field in a document. So I can concatenate the strings before passing =
it to
> Lucene.

Can you describe the bigger problem you are trying to solve?  This looks =
like a classic XY problem: http://people.apache.org/~hossman/#xyproblem

What you are doing above will work OK for what you describe (up to the =
"passing it to Lucene" part), but you probably should explore the use of =
the TermVectorMapper which provides a callback mechanism (similar to a =
SAX parser) that will allow you to build your data structures on the fly =
instead of having to serialize them into two parallel arrays and then =
loop over those arrays to create some other structure.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org