lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Burton-West, Tom" <tburt...@umich.edu>
Subject RE: TermDoc to TermDocsEnum
Date Wed, 23 Mar 2011 18:25:41 GMT
Hi,

If I understand correctly what you are trying to do as far as getting corpusTF, you might
want to look at the implementation of the "-t" flag in  org.apache.lucene.misc/HighFreqTerms.java
in contib.  

Take a look at the getTotalTermFreq method in trunk.


http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/misc/src/java/org/apache/lucene/misc/HighFreqTerms.java?view=markup

3.x version here:

http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/misc/src/java/org/apache/lucene/misc/HighFreqTerms.java?view=markup

Tom
http://www.hathitrust.org/blogs/large-scale-search


-----Original Message-----
From: nitinhardeniya [mailto:nitinhardeniya@gmail.com] 
Sent: Tuesday, March 22, 2011 1:57 PM
To: java-user@lucene.apache.org
Subject: TermDoc to TermDocsEnum

hi

I have a code that work fine with lucene 3.2 where i used TermDocs to find
the corpusTF here is the code 


public void calculateCorpusTF(IndexReader reader) throws IOException {
		// TODO Auto-generated method stub
		Iterator it = word.iterator();

		Iterator  iwp  = word_prop.iterator();
		wordProp wp;
		Term ta = null;

			TermDocs tds;
	//	DocsEnum tds;
		String text;
		tfDoc tfcoll;  
		long freq=0;
		OpenBitSet skipDocs = null;
		skipDocs = new OpenBitSet(0);
		//System.out.println("Length: "+skipDocs.length());
		try {

			while(it.hasNext())
			{	
				text=it.next();
				wp=iwp.next();

				System.out.println("Word is "+text);
				ta= new Term("content",text);
				//BytesRef term = new BytesRef(text.toCharArray(),0,text.length());

				tfcoll = new tfDoc();
				freq=0;

								tds=reader.termDocs(ta);
							//	tds=reader.terms("content");
								if(tds!=null)
								{
									while(tds.next())
									{
									
										freq+=tds.freq();
									//	System.out.print( text +"  "+ freq);
									}
								}

				// New Code -->

//				tds = reader.termDocsEnum(skipDocs, "content", term);
//				if(tds!=null)
//				{
//					while(true) {
//						freq += tds.freq();
//						final int docID = tds.nextDoc();
//						if (docID == DocsEnum.NO_MORE_DOCS) {
//							break;
//						}
//					}
//				}
//				
				// New code Ends <--

				tfcoll.tfA=freq;
				System.out.print( text +"  "+ freq);
				if(tfcoll.totalTF()==0)
				{
					//System.out.println(" "+tfcoll.tfA+" "+tfcoll.tfD+" "+tfcoll.tfC);
					System.out.println("Text "+text+ " Freq "+freq);
				}

				wp.tfColl=tfcoll;
			}
		}
		catch (Exception e) {
			// TODO: handle exception
			e.printStackTrace();
		}

	}

but now i have to use TermDocEnum because i am using lucene dev4.0 which
does not have TermDocs method i was trying to change my code .please refer
to new code [commented ] and tell me how to use this method in a proper way
. if you can provide an example that would be great.
tds = reader.termDocsEnum(skipDocs, "content", term);
I have tried using null at skipdoc because i don't want to skip anything but
it through error

please help

--
View this message in context: http://lucene.472066.n3.nabble.com/TermDoc-to-TermDocsEnum-tp2716046p2716046.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message