lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From starz10de <farag_ah...@yahoo.com>
Subject Re: conditional High Freq Terms in Lucene index
Date Sat, 31 Mar 2012 12:56:07 GMT
I revised it including your comment:



		        private Scorer scorer;
		        private int docBase;
		        
		        // simply print docId and score of every matching document
		        @Override
		        public void collect(int doc) throws IOException {

String k=doc+"";
String k1=docBase+"";

		        	
		        	  doc_ids.add(k+k1);
		
		     
		        
		        }

		        @Override
		        public boolean acceptsDocsOutOfOrder() {
		          return true;
		        }

		        @Override
		        public void setNextReader(IndexReader reader, int docBase)
		            throws IOException {
		          this.docBase = docBase;
		        }

		        @Override
		        public void setScorer(Scorer scorer) throws IOException {
		          this.scorer = scorer;
		        }
		        
		      
	I could see in the highFrequentTerm that the condition for the document
type "A" is performed. However, the highFrequent term isnot computed
correctly, I still see duplicate term in the list beside wrong occuerence.

here how I do it:

TermInfoQueue tiq = new TermInfoQueue(numTerms);
    TermEnum terms = reader.terms();
    TermDocs dok =null; 
    int k=0;
    dok = reader.termDocs(); 
    if (field != null) { 
      while (terms.next()) { 
    	  
    	
          k=0;
      
      dok.seek(terms);
         
        while (dok.next()) {  
        	 
            
           
            	//System.out.println(dok.doc());
        	  for(int i=0;i< doc_ids.size();++i)
        		 {  

                   
if(categorization_based_on_year.doc_ids.get(i).equals(dok.doc()+""))
                    {

// here I can see that only doc ids for the type "A" is printed

System.out.println(dok.doc());

                    	 if (terms.term().field().equals(field)   ) {
                       tiq.insertWithOverflow(new TermInfo(terms.term(),
dok.freq()));
                    	        }
                    	 
               i=10000;
                    }
                 
       	  	 }   
.
.
.

any hint ?

--
View this message in context: http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3873362.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message