lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jnance <nanc...@ornl.gov>
Subject Re: Searching for instances within a document
Date Fri, 11 Jul 2008 13:28:34 GMT

The TermFrequencyVector works perfectly for normal query strings. But if I
add a wild card (*) onto words to search for different forms of the word I
get an ArrayIndexOutOfBoundsException because the index is -1. Why does this
happen? And is there anyway to avoid it?

Thanks,

James



jnance wrote:
> 
> Yes, the term frequency vector is exactly what I needed. Thanks!
> 
> -James
> 
> 
> Ajay Lakhani wrote:
>> 
>> Hi James,
>> 
>> Try this:
>> 
>>     Searcher searcher = new IndexSearcher(dir);
>>     QueryParser parser = new QueryParser("content", new
>> StandardAnalyzer());
>>     Query query = parser.parse(queryString);
>> 
>>     HashSet queryTerms = new HashSet();
>>     query.extractTerms(queryTerms);
>> 
>>     Hits hits = searcher.search(query);
>> 
>>     IndexReader reader = IndexReader.open(dir);
>> 
>>     for (int i =0; i < hits.length() ; i ++){
>>       Document d = hits.doc(i);
>>       Field fid = d.getField("id");
>>       Field ftitle = d.getField("title");
>>       System.out.println("id is " + fid.stringValue());
>>       System.out.println("title is " + ftitle.stringValue());
>> 
>>       TermFreqVector tfv = reader.getTermFreqVector(hits.id(i),
>> "content");
>>       String[] terms = tfv.getTerms();
>>       int [] freqs = tfv.getTermFrequencies();//get the frequencies
>> 
>>       // for each term in the query
>>       for (Iterator iter = queryTerms.iterator(); iter.hasNext();) {
>>         Term term = (Term) iter.next();
>> 
>>         // for each term in the vector
>>         for (int j = 0; j < terms.length; j++) {
>>           if (terms[j].equals(term.text())) {
>>             System.out.println("frequency of term ["+ term.text() +"] is
>> " +
>> freqs[j] );
>>           }
>>         }
>>       }
>>     }
>> 
>> Let me know if this helps.
>> Cheers
>> AJ
>> 
>> 2008/7/10 Karl Wettin <karl.wettin@gmail.com>:
>> 
>>> Maybe you are looking for the document TermFreqVector?
>>>
>>>
>>>       karl
>>>
>>> 9 jul 2008 kl. 15.49 skrev jnance:
>>>
>>>
>>>> Hi,
>>>>
>>>> I am indexing lots of text files and need to see how many times a
>>>> certain
>>>> word comes up in each text file. Right now I have this constructor for
>>>> "search":
>>>>
>>>> static void search(Searcher searcher, String queryString) throws
>>>> ParseException, IOException {
>>>>                 QueryParser parser = new QueryParser("content", new
>>>> StandardAnalyzer());
>>>>                 Query query = parser.parse(queryString);
>>>>                 Hits hits = searcher.search(query);
>>>>
>>>>                 int hitCount = hits.length();
>>>>                 if (hitCount == 0) {
>>>>                         System.out.println("0 documents contain the
>>>> word
>>>> \"" + queryString +
>>>> ".\"");
>>>>                 }
>>>>                 else {
>>>>                         System.out.println(hitCount + " documents
>>>> contain
>>>> the word \"" +
>>>> queryString + ".\"");
>>>>                 }
>>>>         }
>>>>
>>>> This tells me how many documents contain the word I'm looking for...
>>>> but
>>>> how
>>>> do I get it to tell me how many times the word occurs within that
>>>> document?
>>>>
>>>> Thanks,
>>>>
>>>> James
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18362075.html
>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18403878.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message