Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 66095 invoked from network); 8 Nov 2010 12:23:14 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Nov 2010 12:23:14 -0000 Received: (qmail 68228 invoked by uid 500); 8 Nov 2010 12:23:43 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 67988 invoked by uid 500); 8 Nov 2010 12:23:39 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 67968 invoked by uid 99); 8 Nov 2010 12:23:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Nov 2010 12:23:38 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ian.lea@gmail.com designates 209.85.214.48 as permitted sender) Received: from [209.85.214.48] (HELO mail-bw0-f48.google.com) (209.85.214.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Nov 2010 12:23:33 +0000 Received: by bwz9 with SMTP id 9so8034bwz.35 for ; Mon, 08 Nov 2010 04:23:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=aPoYZV9t3dxOVDSmKWtioqckQD0Gb/j03wcUzwU3k9o=; b=u3bLGZdEdnXR3KUiqrxmFAHXw5KIfzoVTdqIT05W3S7Qvg0mKY3ZQyWfp3aKsKbcKT 3OaZwnmTFiiUwAa+yJrotwlonC1Cq5jTnwWJvz6eQw8HGSYpaJPoT0n9pVpaVBeE033f OtFi/zUBVBcb3kE2xV3PqxY3mCPS49sJPuXxg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=l3+K2iFCfMVlRp9/oLcl406F3Oz9UGTFMBz0qddtlwymDilP9Ckr7aVuWWiyQZPid8 Lc169XgRovbpNeIDBOb1aqf/EyORcITU7BnWLtO3KhrNoBQkYjEE2CdQlGRkNYWF4Yko CEhRXCIivxfd+WumSQD/b77a+HHZa0jMkkqpw= Received: by 10.204.55.208 with SMTP id v16mr4736521bkg.214.1289218991829; Mon, 08 Nov 2010 04:23:11 -0800 (PST) MIME-Version: 1.0 Received: by 10.204.116.212 with HTTP; Mon, 8 Nov 2010 04:22:51 -0800 (PST) In-Reply-To: <20101108120512.B7814D36002@mail.troja.net> References: <20101108120512.B7814D36002@mail.troja.net> From: Ian Lea Date: Mon, 8 Nov 2010 12:22:51 +0000 Message-ID: Subject: Re: Antw.: Search returning documents matching a NOT range To: Uwe Schindler Cc: java-user@lucene.apache.org, David Fertig Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable It occurs in David's index and in my much simplifed test/demo index. There is nothing special in mine so I'd guess the problem isn't really index or data related, but certainly can't vouch for that. -- Ian. On Mon, Nov 8, 2010 at 12:05 PM, Uwe Schindler wrote: > That's extremely strange. If this is a bug in Multisearcher, we should fi= x > in proposed 3.0.3 release. Does the problem only occur with this special > index? > > --- > Uwe Schindler > Generics Policeman > Bremen, Germany > > ----- Reply message ----- > Von: "Ian Lea" > Datum: Mo., Nov. 8, 2010 12:45 > Betreff: Search returning documents matching a NOT range > An: > Cc: "David Fertig" > > > This does seem extremely odd. =A0David sent me a copy of his index and > I've played around with it and also written a self-contained RAM index > program, below, that shows the same problem, namely that if the second > index has 1000+ docs the one and only doc in the first index is > incorrectly matched if the search is done with a MultiSearcher. =A0In > answer to Uwe's question, it works correctly if use a single > IndexSearcher on top of a MultiReader. > > Tests run with lucene-core-3.0.2.jar. > > Snippet from program output: > > Larger index with 999 docs > --- multi reader --- > Query: +author:aaa -pubdate:[aaa TO bbb] > MaxDocs: 1000 > Hit count: 0 > --- multi searcher --- > Query: +author:aaa -pubdate:[aaa TO bbb] > MaxDocs: 1000 > Hit count: 0 > > Larger index with 1000 docs > --- multi reader --- > Query: +author:aaa -pubdate:[aaa TO bbb] > MaxDocs: 1001 > Hit count: 0 > --- multi searcher --- > Query: +author:aaa -pubdate:[aaa TO bbb] > MaxDocs: 1001 > Hit count: 1 > Docno: 0 > author: /aaa/, indexed: true > pubdate: /abc/, indexed: true > > ----------------------------------------------------------------------- > package test; > > import org.apache.lucene.analysis.*; > import org.apache.lucene.analysis.standard.*; > import org.apache.lucene.document.*; > import org.apache.lucene.queryParser.QueryParser; > import org.apache.lucene.index.*; > import org.apache.lucene.search.*; > import org.apache.lucene.store.*; > import org.apache.lucene.util.Version; > > public class LuceneTest8 { > > =A0 =A0static public void main(String[] args) throws Exception { > test(999); > test(1000); > test(1001); > =A0 =A0} > > > =A0 =A0static void test(int _max) throws Exception { > System.out.printf("\n\nLarger index with %s docs\n", _max); > Analyzer anl =3D new StandardAnalyzer(Version.LUCENE_30); > Directory dir1 =3D loadIndex(anl, 1, "aaa", "abc"); > Directory dir2 =3D loadIndex(anl, _max, "zzz", "zzz"); > QueryParser qp =3D new QueryParser(Version.LUCENE_30, "author", anl); > String qstr =3D "author:aaa AND NOT pubdate:[aaa TO bbb]"; > Query q =3D qp.parse(qstr); > IndexReader ir1 =3D IndexReader.open(dir1); > IndexReader ir2 =3D IndexReader.open(dir2); > Searcher searcher1 =3D new IndexSearcher(ir1); > Searcher searcher2 =3D new IndexSearcher(ir2); > MultiReader mr =3D new MultiReader(ir1, ir2); > Searcher searcherm1 =3D new IndexSearcher(mr); > MultiSearcher searcherm2 =3D new MultiSearcher(searcher1, searcher2); > search(q, searcher1, "small index"); > search(q, searcher2, "larger index"); > search(q, searcherm1, "multi reader"); > search(q, searcherm2, "multi searcher"); > =A0 =A0} > > > > =A0 =A0static Directory loadIndex(Analyzer _anl, > =A0 =A0 =A0 int _max, > =A0 =A0 =A0 String _author, > =A0 =A0 =A0 String _pd) throws Exception { > RAMDirectory dir =3D new RAMDirectory(); > IndexWriter iw =3D new IndexWriter(dir, > _anl, > true, > IndexWriter.MaxFieldLength.UNLIMITED); > for (int i =3D 0; i < _max; i++) { > =A0 =A0Document d =3D new Document(); > =A0 =A0d.add(new Field("author", _author, > =A0 =A0Field.Store.YES, Field.Index.ANALYZED)); > =A0 =A0d.add(new Field("pubdate", _pd, > =A0 =A0Field.Store.YES, Field.Index.ANALYZED)); > =A0 =A0iw.addDocument(d); > } > iw.close(); > return dir; > =A0 =A0} > > > =A0 =A0static void search(Query _q, > =A0 =A0 =A0 Searcher _searcher, > =A0 =A0 =A0 String _what) throws Exception { > System.out.printf("--- %s ---\n", _what); > System.out.printf("Query: %s\n", _q.toString()); > System.out.printf("MaxDocs: %s\n", _searcher.maxDoc()); > TopDocs topDocs =3D _searcher.search(_q, 10); > System.out.printf("Hit count: %s\n", topDocs.totalHits); > for (int in =3D 0; in < topDocs.totalHits; in++) { > =A0 =A0int docno =3D topDocs.scoreDocs[in].doc; > =A0 =A0Document ldoc =3D _searcher.doc(docno); > =A0 =A0System.out.printf("Docno: %s\n", docno); > =A0 =A0for (Fieldable f : ldoc.getFields()) { > System.out.printf("%s: /%s/, indexed: %s\n", > =A0f.name(), f.stringValue(), f.isIndexed()); > =A0 =A0} > } > =A0 =A0} > } > > > -- > Ian. > > > On Mon, Nov 8, 2010 at 4:32 AM, Uwe Schindler wrote: >> Does the same happen with a MultiReader on top of both indexes and using= a >> single IndexSearcher on top of this MultiReader? >> >> P.S.: How about using NumericField? >> >> ----- >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: uwe@thetaphi.de >> >> >>> -----Original Message----- >>> From: David Fertig [mailto:dfertig@cymfony.com] >>> Sent: Monday, November 08, 2010 4:21 AM >>> To: java-user@lucene.apache.org >>> Subject: RE: Search returning documents matching a NOT range >>> >>> publish_date is a string, formatted as YYYYMMDD, so it string sorting >> should >>> work correctly for this field. >>> >>> The field is indexed as a keyword and the field's value is also stored. >>> >>> I have previously reviewed the terms and optimized the index with luke >>> 1.0.1 to make sure there was no index corruption. It is a very useful >> tool, >>> however it can only open 1 index at a time so I can't reproduce the iss= ue >> with >>> it. >>> >>> At your suggestion I added code to enumerate all terms in the indexes a= nd >>> there are no inconsistencies. >>> >>> Th > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org