lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: How to get mapping of query terms to number of their occurrences in a doc?
Date Thu, 09 Feb 2006 19:05:20 GMT
There is a HighFreqTerms class in contib/misc..... that may be interesting to you.  I just
modified it slightly locally last night to limit things to a specific field, and will commit
it later.

Otis

----- Original Message ----
From: Dmitry Goldenberg <dmitry.goldenberg@weblayers.com>
To: java-user@lucene.apache.org
Sent: Mon 06 Feb 2006 05:34:05 PM EST
Subject: How to get mapping of query terms to number of their occurrences in a doc?

Given a query, I want to be able to, for each query term, get the number of occurrences of
the term.  I have tried what I'm including below and it does not seem to provide reliable
results.  Seems to work fine with exact matching but as soon as stemming kicks in, all bets
are off as to value of the number of occurrences returned.
 
Any ideas, anyone?  Can this be written in a simpler and/or more efficient way?
Thanks -
 
      int totalOccurrences = 0;
 
      reader = IndexReader.open(getDirectory(indexDirPath));
      HashSet terms = new HashSet();
      query.extractTerms(terms);
 
      TermFreqVector[] tfvs = reader.getTermFreqVectors(docId);
      if (tfvs != null) {

        // For each term frequency vector (i.e. for each field)
        for (int i = 0; i < tfvs.length; i++) {
          String field = tfvs[i].getField();
          String[] strTerms = tfvs[i].getTerms();
          int[] tfs = tfvs[i].getTermFrequencies();
 
          if (strTerms != null) {

            // For each term in the query
            for (Iterator iter = terms.iterator(); iter.hasNext();) {

              Term term = (Term) iter.next();
              // For each term in the vector
              for (int j = 0; j < strTerms.length; j++) {

                // If found the query term among the vector terms
                if (field.equals(term.field()) && strTerms[j].equals(term.text()))
{

                  // Add the term frequency to the total
                  totalOccurrences += tfs[j];

                }
              }
            }
          }
        }
      }




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message