lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ajay Lakhani" <lakhani.a...@googlemail.com>
Subject Re: Searching for instances within a document
Date Thu, 10 Jul 2008 07:43:43 GMT
Hi James,

Try this:

    Searcher searcher = new IndexSearcher(dir);
    QueryParser parser = new QueryParser("content", new StandardAnalyzer());
    Query query = parser.parse(queryString);

    HashSet queryTerms = new HashSet();
    query.extractTerms(queryTerms);

    Hits hits = searcher.search(query);

    IndexReader reader = IndexReader.open(dir);

    for (int i =0; i < hits.length() ; i ++){
      Document d = hits.doc(i);
      Field fid = d.getField("id");
      Field ftitle = d.getField("title");
      System.out.println("id is " + fid.stringValue());
      System.out.println("title is " + ftitle.stringValue());

      TermFreqVector tfv = reader.getTermFreqVector(hits.id(i), "content");
      String[] terms = tfv.getTerms();
      int [] freqs = tfv.getTermFrequencies();//get the frequencies

      // for each term in the query
      for (Iterator iter = queryTerms.iterator(); iter.hasNext();) {
        Term term = (Term) iter.next();

        // for each term in the vector
        for (int j = 0; j < terms.length; j++) {
          if (terms[j].equals(term.text())) {
            System.out.println("frequency of term ["+ term.text() +"] is " +
freqs[j] );
          }
        }
      }
    }

Let me know if this helps.
Cheers
AJ

2008/7/10 Karl Wettin <karl.wettin@gmail.com>:

> Maybe you are looking for the document TermFreqVector?
>
>
>       karl
>
> 9 jul 2008 kl. 15.49 skrev jnance:
>
>
>> Hi,
>>
>> I am indexing lots of text files and need to see how many times a certain
>> word comes up in each text file. Right now I have this constructor for
>> "search":
>>
>> static void search(Searcher searcher, String queryString) throws
>> ParseException, IOException {
>>                 QueryParser parser = new QueryParser("content", new
>> StandardAnalyzer());
>>                 Query query = parser.parse(queryString);
>>                 Hits hits = searcher.search(query);
>>
>>                 int hitCount = hits.length();
>>                 if (hitCount == 0) {
>>                         System.out.println("0 documents contain the word
>> \"" + queryString +
>> ".\"");
>>                 }
>>                 else {
>>                         System.out.println(hitCount + " documents contain
>> the word \"" +
>> queryString + ".\"");
>>                 }
>>         }
>>
>> This tells me how many documents contain the word I'm looking for... but
>> how
>> do I get it to tell me how many times the word occurs within that
>> document?
>>
>> Thanks,
>>
>> James
>> --
>> View this message in context:
>> http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18362075.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message