From java-user-return-46613-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Sun Jul 04 18:10:09 2010 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 86118 invoked from network); 4 Jul 2010 18:10:09 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Jul 2010 18:10:09 -0000 Received: (qmail 78888 invoked by uid 500); 4 Jul 2010 18:10:07 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 78833 invoked by uid 500); 4 Jul 2010 18:10:06 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 78825 invoked by uid 99); 4 Jul 2010 18:10:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Jul 2010 18:10:06 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.161.176] (HELO mail-gx0-f176.google.com) (209.85.161.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Jul 2010 18:09:59 +0000 Received: by gxk7 with SMTP id 7so426963gxk.35 for ; Sun, 04 Jul 2010 11:08:49 -0700 (PDT) MIME-Version: 1.0 Received: by 10.90.31.11 with SMTP id e11mr2087834age.164.1278266927375; Sun, 04 Jul 2010 11:08:47 -0700 (PDT) Received: by 10.151.106.16 with HTTP; Sun, 4 Jul 2010 11:08:47 -0700 (PDT) In-Reply-To: <4C29BBD1.3050708@isb-sib.ch> References: <4C29BBD1.3050708@isb-sib.ch> Date: Sun, 4 Jul 2010 14:08:47 -0400 Message-ID: Subject: Re: Unsupported operation in TermDocs.next() when migrating from 2.4 to 2.9 From: Michael McCandless To: java-user@lucene.apache.org, jerven.bolleman@isb-sib.ch Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org That is spooky. It certainly sounds like a regression. It's odd that your MultiTermEnum is pulling an AllTermDocs under the hood -- this should only happen if you did a .seek(null) on it, but your code seems to first check that term !=3D null, so it should never pass a null term. Can you add a temporary assert to DirectoryReader.java, in 29x, around line 1191. It should be this method: protected TermDocs termDocs(IndexReader reader) throws IOException { return term=3D=3Dnull ? reader.termDocs(null) : reader.termDocs(); } Add an assert term !=3D null, and run you code w/ assertions on, and see if it trips (the assert is not safe, in general, but should not trip in how I think you are using it). If it does trip... try to track down how a null term got in there? Mike On Tue, Jun 29, 2010 at 5:24 AM, Jerven Bolleman wrote: > Hi All, > > I am finally having some time to upgrade our lucene from the 2.4 series t= o > the 2.9 series. And I am having a problem that while everything compiles > great I am getting a new UnsupportedOperationException. > > > java.lang.UnsupportedOperationException > =A0 =A0 =A0 =A0at > org.apache.lucene.index.AbstractAllTermDocs.seek(AbstractAllTermDocs.java= :42) > =A0 =A0 =A0 =A0at > org.apache.lucene.index.DirectoryReader$MultiTermDocs.termDocs(DirectoryR= eader.java:1186) > =A0 =A0 =A0 =A0at > org.apache.lucene.index.DirectoryReader$MultiTermDocs.next(DirectoryReade= r.java:1118) > =A0 =A0 =A0 =A0at > org.expasy.core.index.SubQueryFilter.fastForLargeResultSets(SubQueryFilte= r.java:129) > > I copied in the code that calls this. See an explanation of what it tries= to > achieve underneath. > > private void fastForLargeResultSets(String foreignField, BitSet bits, > TermDocs docs, TermDocs foreignDocs, IndexReader foreignReader, BitSet > queryResults) > =A0 =A0 =A0 =A0throws IOException > { > =A0 =A0 =A0 =A0int start =3D queryResults.nextSetBit(0); > =A0 =A0 =A0 =A0TermEnum foreignEnum =3D foreignReader.terms(new Term(fore= ignField, > "")); > =A0 =A0 =A0 =A0while (foreignEnum.next()) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Term term =3D foreignEnum.term(); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (term =3D=3D null || !term.field().equa= ls(foreignField)) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (!term.text().equals("not_null")) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0foreignDocs.skipTo(start); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0foreignDocs.seek(term); > //Source of exception in my code > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0while (foreignDocs.next()) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0int doc = =3D foreignDocs.doc(); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (queryR= esults.get(doc)) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0foreignDocs.skipTo(doc); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0if (term !=3D null && term.text() !=3D > null) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0buffer.add(term.text()); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > // Use a buffer to avoid jumping around on disk to much. > // > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (buffer= .size() >=3D BUFFERSIZE) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0emptyBuffer(buffer, bits, docs); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0} > > =A0 =A0 =A0 =A0if (!buffer.isEmpty()) > =A0 =A0 =A0 =A0{ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0emptyBuffer(buffer, bits, docs); > =A0 =A0 =A0 =A0} > } > > The purpose of this code is to fill a bitset as a filter. The filter is u= sed > to find documents in index a who have a linking key value to them in inde= x > b. > > While resource intensive this code path was quite fast for when you have > multimillion documents in index b pointing to multimillion documents in > index b. > > i.e. it creates a "join" between two queries on different indexes. > > for a live example > http://www.uniprot.org/uniprot/?query=3Dcitation%3A%28author%3Afink%29 > this a search for fink in the field author in the "citation" index. > For each document in the "citation" index that matches term "fink" in the > field "author" retrieve the terms that contain an uniquely identifying ke= y > value for documents in the "uniprot" index. Generate a bitset to use in > filtering the documents in the "uniprot" index (done in the emptybuffer > method). > > Is this a bug? and does anyone have ideas for an effective (maybe superio= r) > work around? > > Regards and thanks for a great project! > > Jerven > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org