lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jnance <nanc...@ornl.gov>
Subject Re: Searching for instances within a document
Date Thu, 10 Jul 2008 12:23:02 GMT

Yes, the term frequency vector is exactly what I needed. Thanks!

-James


Ajay Lakhani wrote:
> 
> Hi James,
> 
> Try this:
> 
>     Searcher searcher = new IndexSearcher(dir);
>     QueryParser parser = new QueryParser("content", new
> StandardAnalyzer());
>     Query query = parser.parse(queryString);
> 
>     HashSet queryTerms = new HashSet();
>     query.extractTerms(queryTerms);
> 
>     Hits hits = searcher.search(query);
> 
>     IndexReader reader = IndexReader.open(dir);
> 
>     for (int i =0; i < hits.length() ; i ++){
>       Document d = hits.doc(i);
>       Field fid = d.getField("id");
>       Field ftitle = d.getField("title");
>       System.out.println("id is " + fid.stringValue());
>       System.out.println("title is " + ftitle.stringValue());
> 
>       TermFreqVector tfv = reader.getTermFreqVector(hits.id(i),
> "content");
>       String[] terms = tfv.getTerms();
>       int [] freqs = tfv.getTermFrequencies();//get the frequencies
> 
>       // for each term in the query
>       for (Iterator iter = queryTerms.iterator(); iter.hasNext();) {
>         Term term = (Term) iter.next();
> 
>         // for each term in the vector
>         for (int j = 0; j < terms.length; j++) {
>           if (terms[j].equals(term.text())) {
>             System.out.println("frequency of term ["+ term.text() +"] is "
> +
> freqs[j] );
>           }
>         }
>       }
>     }
> 
> Let me know if this helps.
> Cheers
> AJ
> 
> 2008/7/10 Karl Wettin <karl.wettin@gmail.com>:
> 
>> Maybe you are looking for the document TermFreqVector?
>>
>>
>>       karl
>>
>> 9 jul 2008 kl. 15.49 skrev jnance:
>>
>>
>>> Hi,
>>>
>>> I am indexing lots of text files and need to see how many times a
>>> certain
>>> word comes up in each text file. Right now I have this constructor for
>>> "search":
>>>
>>> static void search(Searcher searcher, String queryString) throws
>>> ParseException, IOException {
>>>                 QueryParser parser = new QueryParser("content", new
>>> StandardAnalyzer());
>>>                 Query query = parser.parse(queryString);
>>>                 Hits hits = searcher.search(query);
>>>
>>>                 int hitCount = hits.length();
>>>                 if (hitCount == 0) {
>>>                         System.out.println("0 documents contain the word
>>> \"" + queryString +
>>> ".\"");
>>>                 }
>>>                 else {
>>>                         System.out.println(hitCount + " documents
>>> contain
>>> the word \"" +
>>> queryString + ".\"");
>>>                 }
>>>         }
>>>
>>> This tells me how many documents contain the word I'm looking for... but
>>> how
>>> do I get it to tell me how many times the word occurs within that
>>> document?
>>>
>>> Thanks,
>>>
>>> James
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18362075.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18381743.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message