lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: conditional High Freq Terms in Lucene index
Date Sat, 31 Mar 2012 13:23:26 GMT
Hmm, you are adding two strings.  You should first add the two ints
(docBase + doc), then convert that to a string.

Mike McCandless

http://blog.mikemccandless.com

On Sat, Mar 31, 2012 at 8:56 AM, starz10de <farag_ahmed@yahoo.com> wrote:
> I revised it including your comment:
>
>
>
>                        private Scorer scorer;
>                        private int docBase;
>
>                        // simply print docId and score of every matching
document
>                        @Override
>                        public void collect(int doc) throws IOException {
>
> String k=doc+"";
> String k1=docBase+"";
>
>
>                                  doc_ids.add(k+k1);
>
>
>
>                        }
>
>                        @Override
>                        public boolean acceptsDocsOutOfOrder() {
>                          return true;
>                        }
>
>                        @Override
>                        public void setNextReader(IndexReader reader, int
docBase)
>                            throws IOException {
>                          this.docBase = docBase;
>                        }
>
>                        @Override
>                        public void setScorer(Scorer scorer) throws IOException
{
>                          this.scorer = scorer;
>                        }
>
>
>        I could see in the highFrequentTerm that the condition for the document
> type "A" is performed. However, the highFrequent term isnot computed
> correctly, I still see duplicate term in the list beside wrong occuerence.
>
> here how I do it:
>
> TermInfoQueue tiq = new TermInfoQueue(numTerms);
>    TermEnum terms = reader.terms();
>    TermDocs dok =null;
>    int k=0;
>    dok = reader.termDocs();
>    if (field != null) {
>      while (terms.next()) {
>
>
>          k=0;
>
>      dok.seek(terms);
>
>        while (dok.next()) {
>
>
>
>                //System.out.println(dok.doc());
>                  for(int i=0;i< doc_ids.size();++i)
>                         {
>
>
> if(categorization_based_on_year.doc_ids.get(i).equals(dok.doc()+""))
>                    {
>
> // here I can see that only doc ids for the type "A" is printed
>
> System.out.println(dok.doc());
>
>                         if (terms.term().field().equals(field)   ) {
>                       tiq.insertWithOverflow(new TermInfo(terms.term(),
> dok.freq()));
>                                }
>
>               i=10000;
>                    }
>
>                 }
> .
> .
> .
>
> any hint ?
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3873362.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message