Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0DA11D8A5 for ; Tue, 14 May 2013 17:32:51 +0000 (UTC) Received: (qmail 20831 invoked by uid 500); 14 May 2013 17:32:48 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 20581 invoked by uid 500); 14 May 2013 17:32:48 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 20571 invoked by uid 99); 14 May 2013 17:32:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 May 2013 17:32:48 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ravikumar.govindarajan@gmail.com designates 209.85.214.43 as permitted sender) Received: from [209.85.214.43] (HELO mail-bk0-f43.google.com) (209.85.214.43) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 May 2013 17:32:44 +0000 Received: by mail-bk0-f43.google.com with SMTP id jm19so496767bkc.16 for ; Tue, 14 May 2013 10:32:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=FbmA9lUH8aHReMPaFwGza/hCpjcSZIN2+eHx5XEMcdQ=; b=JnuIr3nQNKyxJ8i2cl/Xt1mVBL7CQZCeg3Wihu0FXGRvrJJqQhQ+nfkOLKON6YqaX5 Bp0dU6UzfTazN8hdE7axsgoqBnStzkNyveQIb8TvMX3vZn37lEiOp2iRWT9HWijjPYmK pP8PuEwAkAhuE6lbKeRCbONK8aJ4e9e+qix7SmjLkD4iZ97OZlpM8tpIWO4j8mO892U8 agnmN79c7f1P59p1AhoOrv6slb7nl5oKxHM2nzg7pkFb2xhhaYnlTlHpRBtPbDWk7Yb8 +vYu9Eh4mda2Ukms+zsr/u1yB890ir6iSWFYMNs6eOhdGbIzMoYBb7DFi3VgtjihLeul eMAA== MIME-Version: 1.0 X-Received: by 10.204.98.77 with SMTP id p13mr9155526bkn.86.1368552742897; Tue, 14 May 2013 10:32:22 -0700 (PDT) Received: by 10.204.50.131 with HTTP; Tue, 14 May 2013 10:32:22 -0700 (PDT) In-Reply-To: References: Date: Tue, 14 May 2013 23:02:22 +0530 Message-ID: Subject: Re: TermsEnum.docFreq() returns 0 From: Ravikumar Govindarajan To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=089e011609e818f8a604dcb10474 X-Virus-Checked: Checked by ClamAV on apache.org --089e011609e818f8a604dcb10474 Content-Type: text/plain; charset=ISO-8859-1 Thanks for the help Mike. Was quick to jump to a wrong conclusion My codec does not implement Term-Vectors, Payloads, DocValues and Norms. It should be trivial to implement Payloads, but I am not sure about others. Anyways, I can generate a HTML report and identify failures based on individual tests -- Ravi On Tue, May 14, 2013 at 3:31 PM, Michael McCandless < lucene@mikemccandless.com> wrote: > On Tue, May 14, 2013 at 3:03 AM, Ravikumar Govindarajan > wrote: > > We ran the checkIndex and a simple test case. It passes. Actually, I had > > assumed problem with lucene, whereas it was an issue with our custom > codec. > > Phew, thanks for bringing closure! > > > I do not know how to confirm whether a new codec works correctly. Are > there > > any tools/existing test-cases available for validation? > > One really healthy way to test your new codec is to run all Lucene > tests against it (assume your codec is general, i.e. implements > everything). > > You just need to 1) get your codec onto the test classpath and 2) pass > -Dtests.codec=YourCodecName to force tests to use it. > > I'm not certain about step 1) ... it could be passing -lib to ant does > that? But I'm not sure that will propagate to the classpath when ant > runs the tests ... > > Mike McCandless > > http://blog.mikemccandless.com > > > > > -- > > Ravi > > > > > > > > On Mon, May 13, 2013 at 9:19 PM, Michael McCandless < > > lucene@mikemccandless.com> wrote: > > > >> That code looks correct. > >> > >> But can you tie it all together into a runnable test case? Ie add in > >> the terms enum, calling docFreq and getting 0 when it should be 1. > >> > >> Also, if you run CheckIndex on the index produced by the code below, > >> how many terms/freqs/positions does it report? > >> > >> Mike McCandless > >> > >> http://blog.mikemccandless.com > >> > >> > >> On Mon, May 13, 2013 at 9:25 AM, Ravikumar Govindarajan > >> wrote: > >> > Indexing code below. Looks very simple. Is this correct? > >> > > >> > IndexWriterConfig conf = new > >> > IndexWriterConfig(Version.LUCENE_42, new > >> > StandardAnalyzer(Version.LUCENE_42)); > >> > conf.setOpenMode(OpenMode.CREATE_OR_APPEND); > >> > String indexPath = ""; > >> > Directory dir=FSDirectory.open(new File(indexPath)); > >> > writer = new IndexWriter(dir,conf); > >> > FieldType type = new FieldType(); > >> > type.setTokenized(true); > >> > type.setIndexed(true); > >> > type.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS); > >> > Field field = new Field("content", "one two two three", type); > >> > luceneDoc.add(field); > >> > writer.addDocument(luceneDoc); > >> > writer.close(); > >> > > >> > Reading docFreq and totalTermFreq through terms-enum returns 0 and -1, > >> for > >> > all terms > >> > > >> > -- > >> > Ravi > >> > > >> > > >> > On Fri, May 10, 2013 at 10:19 PM, Michael McCandless < > >> > lucene@mikemccandless.com> wrote: > >> > > >> >> It should not be 0, as long as TermsEnum.next() does not return null > >> >> ... can you make a small test case? Thanks. > >> >> > >> >> Mike McCandless > >> >> > >> >> http://blog.mikemccandless.com > >> >> > >> >> > >> >> On Fri, May 10, 2013 at 8:26 AM, Ravikumar Govindarajan > >> >> wrote: > >> >> > I have to add that the above code is wrong. > >> >> > > >> >> > It has to be > >> >> > > >> >> > while((ref=tEnum.next())!=null) > >> >> > { > >> >> > ref = tEnum.term(); > >> >> > tEnum.docFreq(); // Even here VAL=0 > >> >> > } > >> >> > > >> >> > Apologies for the mistake, but the problem remains > >> >> > > >> >> > > >> >> > > >> >> > On Fri, May 10, 2013 at 5:54 PM, Ravikumar Govindarajan < > >> >> > ravikumar.govindarajan@gmail.com> wrote: > >> >> > > >> >> >> We have the following code > >> >> >> > >> >> >> SegmentInfos segments = new SegmentInfos(); > >> >> >> segments.read(luceneDir); > >> >> >> for(SegmentInfoPerCommit sipc: segments) > >> >> >> { > >> >> >> String name = sipc.info.name; > >> >> >> SegmentReader reader = new SegmentReader(sipc, 1, new > IOContext()); > >> >> >> Terms terms = reader.terms("content"); > >> >> >> TermsEnum tEnum = terms.iterator(null); > >> >> >> tEnum.docFreq(); //VAL=0 > >> >> >> tEnum.totalTermFreq(); //VAL=-1 > >> >> >> } > >> >> >> > >> >> >> The field "content" is indexed as DOCS_FREQ_AND_POSITION > >> >> >> > >> >> >> Why does the docFreq returned as 0 for all terms. Is this > expected or > >> >> am I > >> >> >> doing something wrong? > >> >> >> > >> >> >> -- > >> >> >> Ravi > >> >> >> > >> >> >> > >> >> >> > >> >> > >> >> --------------------------------------------------------------------- > >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >> >> For additional commands, e-mail: java-user-help@lucene.apache.org > >> >> > >> >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >> For additional commands, e-mail: java-user-help@lucene.apache.org > >> > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --089e011609e818f8a604dcb10474--