lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aslam bari <iamasla...@yahoo.co.in>
Subject Re: Words Frequency Problem
Date Fri, 18 Aug 2006 07:28:20 GMT
Hi soren,
  Thanks a lot for help.
  As you suggest me, i written the code, but i wonder Field does not contain Store, Index
YES etc. It only contains Field.Keyword, Field.Text etc. Am i missing something. My Code is
as Below.
   
    indexWriter = new IndexWriter(indexpath, analyzer, false);
           // Create document
         Document doc = new Document();
           doc.add(Field.Keyword(URI_FIELD, uri.toString()));
         doc.add(Field.Text(CONTENT_TEXT, readContent(revisionDescriptor, revisionContent)));
        
  //It gives error
  /* doc.add(new Field("frequency", "field value", Field.Store.YES, Field.Index.TOKENIZED,
Field.TermVector.YES));*/
         
         if ( revisionContent != null && revisionDescriptor != null ) {
            List extractor = ExtractorManager.getInstance().getContentExtractors(uri.getNamespace().getName(),
(NodeRevisionDescriptors)null, revisionDescriptor);
                 for ( int i = 0, l = extractor.size(); i < l; i++ ) {
                      Reader reader = ((ContentExtractor)extractor.get(i)).extract(new ByteArrayInputStream(revisionContent.getContentBytes()));
                      doc.add(Field.Text(CONTENT_TEXT, reader));
                 }
            }
              indexWriter.addDocument(doc);
            indexWriter.optimize();

Sören Pekrul <soeren.pekrul@gmx.de> wrote:
  aslam bari wrote:
> I am searching for a word "circle" in my indexed document list. It gives me total document
found 4 i.e. Hits. But now i want to get how many occurances are there in each document i.e.
frequency of words in result document.

Hello Aslam,

you should store the TermVector in the index as well:

doc.add(new Field("field name", "field value", Field.Store.YES, 
Field.Index.TOKENIZED, Field.TermVector.YES));

"A term vector is a list of the document's terms and their number of 
occurences in that document."

The IndexReader allows you to access the TermVector of a document:

TermFreqVector IndexReader.getTermFreqVector(int docNumber, String field)

I hope it helps.

Sören

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



 				
---------------------------------
 Here's a new way to find what you're looking for - Yahoo! Answers 
 Send FREE SMS to your friend's mobile from Yahoo! Messenger Version 8. Get it NOW
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message