lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: TermsEnum.docFreq() returns 0
Date Tue, 14 May 2013 17:32:22 GMT
Thanks for the help Mike. Was quick to jump to a wrong conclusion

My codec does not implement Term-Vectors, Payloads, DocValues and Norms.

It should be trivial to implement Payloads, but I am not sure about others.

Anyways, I can generate a HTML report and identify failures based on
individual tests

--
Ravi


On Tue, May 14, 2013 at 3:31 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Tue, May 14, 2013 at 3:03 AM, Ravikumar Govindarajan
> <ravikumar.govindarajan@gmail.com> wrote:
> > We ran the checkIndex and a simple test case. It passes. Actually, I had
> > assumed problem with lucene, whereas it was an issue with our custom
> codec.
>
> Phew, thanks for bringing closure!
>
> > I do not know how to confirm whether a new codec works correctly. Are
> there
> > any tools/existing test-cases available for validation?
>
> One really healthy way to test your new codec is to run all Lucene
> tests against it (assume your codec is general, i.e. implements
> everything).
>
> You just need to 1) get your codec onto the test classpath and 2) pass
> -Dtests.codec=YourCodecName to force tests to use it.
>
> I'm not certain about step 1) ... it could be passing -lib to ant does
> that?  But I'm not sure that will propagate to the classpath when ant
> runs the tests ...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
>
> > --
> > Ravi
> >
> >
> >
> > On Mon, May 13, 2013 at 9:19 PM, Michael McCandless <
> > lucene@mikemccandless.com> wrote:
> >
> >> That code looks correct.
> >>
> >> But can you tie it all together into a runnable test case?  Ie add in
> >> the terms enum, calling docFreq and getting 0 when it should be 1.
> >>
> >> Also, if you run CheckIndex on the index produced by the code below,
> >> how many terms/freqs/positions does it report?
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Mon, May 13, 2013 at 9:25 AM, Ravikumar Govindarajan
> >> <ravikumar.govindarajan@gmail.com> wrote:
> >> > Indexing code below. Looks very simple. Is this correct?
> >> >
> >> >            IndexWriterConfig conf = new
> >> > IndexWriterConfig(Version.LUCENE_42, new
> >> > StandardAnalyzer(Version.LUCENE_42));
> >> >             conf.setOpenMode(OpenMode.CREATE_OR_APPEND);
> >> >             String indexPath = "<some-file-path>";
> >> >             Directory dir=FSDirectory.open(new File(indexPath));
> >> >             writer = new IndexWriter(dir,conf);
> >> >             FieldType type = new FieldType();
> >> >             type.setTokenized(true);
> >> >             type.setIndexed(true);
> >> >  type.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS);
> >> >         Field field = new Field("content", "one two two three", type);
> >> >         luceneDoc.add(field);
> >> >         writer.addDocument(luceneDoc);
> >> >         writer.close();
> >> >
> >> > Reading docFreq and totalTermFreq through terms-enum returns 0 and -1,
> >> for
> >> > all terms
> >> >
> >> > --
> >> > Ravi
> >> >
> >> >
> >> > On Fri, May 10, 2013 at 10:19 PM, Michael McCandless <
> >> > lucene@mikemccandless.com> wrote:
> >> >
> >> >> It should not be 0, as long as TermsEnum.next() does not return null
> >> >> ... can you make a small test case?  Thanks.
> >> >>
> >> >> Mike McCandless
> >> >>
> >> >> http://blog.mikemccandless.com
> >> >>
> >> >>
> >> >> On Fri, May 10, 2013 at 8:26 AM, Ravikumar Govindarajan
> >> >> <ravikumar.govindarajan@gmail.com> wrote:
> >> >> > I have to add that the above code is wrong.
> >> >> >
> >> >> > It has to be
> >> >> >
> >> >> >  while((ref=tEnum.next())!=null)
> >> >> >                     {
> >> >> >                         ref = tEnum.term();
> >> >> >                         tEnum.docFreq(); // Even here VAL=0
> >> >> >                     }
> >> >> >
> >> >> > Apologies for the mistake, but the problem remains
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Fri, May 10, 2013 at 5:54 PM, Ravikumar Govindarajan <
> >> >> > ravikumar.govindarajan@gmail.com> wrote:
> >> >> >
> >> >> >> We have the following code
> >> >> >>
> >> >> >> SegmentInfos segments = new SegmentInfos();
> >> >> >>  segments.read(luceneDir);
> >> >> >>  for(SegmentInfoPerCommit sipc: segments)
> >> >> >> {
> >> >> >> String name = sipc.info.name;
> >> >> >> SegmentReader reader = new SegmentReader(sipc, 1, new
> IOContext());
> >> >> >> Terms terms = reader.terms("content");
> >> >> >> TermsEnum tEnum = terms.iterator(null);
> >> >> >>  tEnum.docFreq(); //VAL=0
> >> >> >>  tEnum.totalTermFreq(); //VAL=-1
> >> >> >> }
> >> >> >>
> >> >> >> The field "content" is indexed as DOCS_FREQ_AND_POSITION
> >> >> >>
> >> >> >> Why does the docFreq returned as 0 for all terms. Is this
> expected or
> >> >> am I
> >> >> >> doing something wrong?
> >> >> >>
> >> >> >> --
> >> >> >> Ravi
> >> >> >>
> >> >> >>
> >> >> >>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >>
> >> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message