lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Roberts <m...@andy-roberts.net>
Subject Re: Retrieve all terms
Date Thu, 19 May 2005 10:11:00 GMT
On Thursday 19 May 2005 06:53, Morus Walter wrote:

> I think he doesn't want the contents but a term list for these contents.
> Something like
> 1	  1
> 4	  1
> content	  2
> document  2
> for his sample, where the number is the fequency of the term.
>
> I don't think that you can easily get that from one lucene index.
> The easiest way to get a term listing for one field of one document is
> to use the term vector support. But for a document collection that would
> still mean to join all term vectors of all matched documents.

I'm not sure if this helps. I have a method that I used when experimenting 
with Lucene. It takes in a String which contains the path for an index, opens 
a reader, enumerates each term in that index. For each term, it prints out 
the term itself its frequency, and the number of documents that term appeared 
in.

public static void viewTerms(String indexPath) throws IOException {
	
		IndexReader reader = IndexReader.open(indexPath);

		TermEnum te = reader.terms();

		while (te.next()) {
			Term currentTerm = te.term();
			
			TermPositions tp = reader.termPositions(currentTerm);
			
			int termFreq = 0;
			
			while (tp.next()) {
				termFreq += tp.freq();
			}
			
			System.out.println(currentTerm.text() + "(" + termFreq + "|" + te.docFreq() 
+ ")");
		
		}

		reader.close();
	}

HTH,
Andy

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message