lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Using a TermFreqVector to get counts of all words in a document
Date Wed, 20 Oct 2010 20:20:13 GMT

On Oct 20, 2010, at 2:53 PM, Martin O'Shea wrote:

> Uwe
> 
> Thanks - I figured that bit out. I'm a Lucene 'newbie'.
> 
> What I would like to know though is if it is practical to search a single
> document of one field simply by doing this:
> 
> IndexReader trd = IndexReader.open(index);
>        TermFreqVector tfv = trd.getTermFreqVector(docId, "title");
>        String[] terms = tfv.getTerms();
>        int[] freqs = tfv.getTermFrequencies();
>        for (int i = 0; i < tfv.getTerms().length; i++) {
>            System.out.println("Term " + terms[i] + " Freq: " + freqs[i]);
>        }
>        trd.close();
> 
> where docId is set to 0.
> 
> The code works but can this be improved upon at all?
> 
> My situation is where I don't want to calculate the number of documents with
> a particular string. Rather I want to get counts of individual words in a
> field in a document. So I can concatenate the strings before passing it to
> Lucene.

Can you describe the bigger problem you are trying to solve?  This looks like a classic XY
problem: http://people.apache.org/~hossman/#xyproblem

What you are doing above will work OK for what you describe (up to the "passing it to Lucene"
part), but you probably should explore the use of the TermVectorMapper which provides a callback
mechanism (similar to a SAX parser) that will allow you to build your data structures on the
fly instead of having to serialize them into two parallel arrays and then loop over those
arrays to create some other structure.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message