Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of ian.lea@gmail.com designates
 209.85.214.48 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :cc:content-type:content-transfer-encoding;
        b=l3+K2iFCfMVlRp9/oLcl406F3Oz9UGTFMBz0qddtlwymDilP9Ckr7aVuWWiyQZPid8
         Lc169XgRovbpNeIDBOb1aqf/EyORcITU7BnWLtO3KhrNoBQkYjEE2CdQlGRkNYWF4Yko
         CEhRXCIivxfd+WumSQD/b77a+HHZa0jMkkqpw=
MIME-Version: 1.0
In-Reply-To: <20101108120512.B7814D36002@mail.troja.net>
References: <20101108120512.B7814D36002@mail.troja.net>
From: Ian Lea <ian.lea@gmail.com>
Date: Mon, 8 Nov 2010 12:22:51 +0000
Message-ID: <AANLkTikwfsWN5iRx5eYzpS+X5sqvkmjb-dS3vTgoOgbs@mail.gmail.com>
Subject: Re: Antw.: Search returning documents matching a NOT range
To: Uwe Schindler <uwe@thetaphi.de>
Cc: java-user@lucene.apache.org, David Fertig <dfertig@cymfony.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

It occurs in David's index and in my much simplifed test/demo index.
There is nothing special in mine so I'd guess the problem isn't really
index or data related, but certainly can't vouch for that.


--
Ian.


On Mon, Nov 8, 2010 at 12:05 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> That's extremely strange. If this is a bug in Multisearcher, we should fi=
x
> in proposed 3.0.3 release. Does the problem only occur with this special
> index?
>
> ---
> Uwe Schindler
> Generics Policeman
> Bremen, Germany
>
> ----- Reply message -----
> Von: "Ian Lea" <ian.lea@gmail.com>
> Datum: Mo., Nov. 8, 2010 12:45
> Betreff: Search returning documents matching a NOT range
> An: <java-user@lucene.apache.org>
> Cc: "David Fertig" <dfertig@cymfony.com>
>
>
> This does seem extremely odd. =A0David sent me a copy of his index and
> I've played around with it and also written a self-contained RAM index
> program, below, that shows the same problem, namely that if the second
> index has 1000+ docs the one and only doc in the first index is
> incorrectly matched if the search is done with a MultiSearcher. =A0In
> answer to Uwe's question, it works correctly if use a single
> IndexSearcher on top of a MultiReader.
>
> Tests run with lucene-core-3.0.2.jar.
>
> Snippet from program output:
>
> Larger index with 999 docs
> --- multi reader ---
> Query: +author:aaa -pubdate:[aaa TO bbb]
> MaxDocs: 1000
> Hit count: 0
> --- multi searcher ---
> Query: +author:aaa -pubdate:[aaa TO bbb]
> MaxDocs: 1000
> Hit count: 0
>
> Larger index with 1000 docs
> --- multi reader ---
> Query: +author:aaa -pubdate:[aaa TO bbb]
> MaxDocs: 1001
> Hit count: 0
> --- multi searcher ---
> Query: +author:aaa -pubdate:[aaa TO bbb]
> MaxDocs: 1001
> Hit count: 1
> Docno: 0
> author: /aaa/, indexed: true
> pubdate: /abc/, indexed: true
>
> -----------------------------------------------------------------------
> package test;
>
> import org.apache.lucene.analysis.*;
> import org.apache.lucene.analysis.standard.*;
> import org.apache.lucene.document.*;
> import org.apache.lucene.queryParser.QueryParser;
> import org.apache.lucene.index.*;
> import org.apache.lucene.search.*;
> import org.apache.lucene.store.*;
> import org.apache.lucene.util.Version;
>
> public class LuceneTest8 {
>
> =A0 =A0static public void main(String[] args) throws Exception {
> test(999);
> test(1000);
> test(1001);
> =A0 =A0}
>
>
> =A0 =A0static void test(int _max) throws Exception {
> System.out.printf("\n\nLarger index with %s docs\n", _max);
> Analyzer anl =3D new StandardAnalyzer(Version.LUCENE_30);
> Directory dir1 =3D loadIndex(anl, 1, "aaa", "abc");
> Directory dir2 =3D loadIndex(anl, _max, "zzz", "zzz");
> QueryParser qp =3D new QueryParser(Version.LUCENE_30, "author", anl);
> String qstr =3D "author:aaa AND NOT pubdate:[aaa TO bbb]";
> Query q =3D qp.parse(qstr);
> IndexReader ir1 =3D IndexReader.open(dir1);
> IndexReader ir2 =3D IndexReader.open(dir2);
> Searcher searcher1 =3D new IndexSearcher(ir1);
> Searcher searcher2 =3D new IndexSearcher(ir2);
> MultiReader mr =3D new MultiReader(ir1, ir2);
> Searcher searcherm1 =3D new IndexSearcher(mr);
> MultiSearcher searcherm2 =3D new MultiSearcher(searcher1, searcher2);
> search(q, searcher1, "small index");
> search(q, searcher2, "larger index");
> search(q, searcherm1, "multi reader");
> search(q, searcherm2, "multi searcher");
> =A0 =A0}
>
>
>
> =A0 =A0static Directory loadIndex(Analyzer _anl,
> =A0 =A0 =A0 int _max,
> =A0 =A0 =A0 String _author,
> =A0 =A0 =A0 String _pd) throws Exception {
> RAMDirectory dir =3D new RAMDirectory();
> IndexWriter iw =3D new IndexWriter(dir,
> _anl,
> true,
> IndexWriter.MaxFieldLength.UNLIMITED);
> for (int i =3D 0; i < _max; i++) {
> =A0 =A0Document d =3D new Document();
> =A0 =A0d.add(new Field("author", _author,
> =A0 =A0Field.Store.YES, Field.Index.ANALYZED));
> =A0 =A0d.add(new Field("pubdate", _pd,
> =A0 =A0Field.Store.YES, Field.Index.ANALYZED));
> =A0 =A0iw.addDocument(d);
> }
> iw.close();
> return dir;
> =A0 =A0}
>
>
> =A0 =A0static void search(Query _q,
> =A0 =A0 =A0 Searcher _searcher,
> =A0 =A0 =A0 String _what) throws Exception {
> System.out.printf("--- %s ---\n", _what);
> System.out.printf("Query: %s\n", _q.toString());
> System.out.printf("MaxDocs: %s\n", _searcher.maxDoc());
> TopDocs topDocs =3D _searcher.search(_q, 10);
> System.out.printf("Hit count: %s\n", topDocs.totalHits);
> for (int in =3D 0; in < topDocs.totalHits; in++) {
> =A0 =A0int docno =3D topDocs.scoreDocs[in].doc;
> =A0 =A0Document ldoc =3D _searcher.doc(docno);
> =A0 =A0System.out.printf("Docno: %s\n", docno);
> =A0 =A0for (Fieldable f : ldoc.getFields()) {
> System.out.printf("%s: /%s/, indexed: %s\n",
> =A0f.name(), f.stringValue(), f.isIndexed());
> =A0 =A0}
> }
> =A0 =A0}
> }
>
>
> --
> Ian.
>
>
> On Mon, Nov 8, 2010 at 4:32 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
>> Does the same happen with a MultiReader on top of both indexes and using=
 a
>> single IndexSearcher on top of this MultiReader?
>>
>> P.S.: How about using NumericField?
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>>> -----Original Message-----
>>> From: David Fertig [mailto:dfertig@cymfony.com]
>>> Sent: Monday, November 08, 2010 4:21 AM
>>> To: java-user@lucene.apache.org
>>> Subject: RE: Search returning documents matching a NOT range
>>>
>>> publish_date is a string, formatted as YYYYMMDD, so it string sorting
>> should
>>> work correctly for this field.
>>>
>>> The field is indexed as a keyword and the field's value is also stored.
>>>
>>> I have previously reviewed the terms and optimized the index with luke
>>> 1.0.1 to make sure there was no index corruption. It is a very useful
>> tool,
>>> however it can only open 1 index at a time so I can't reproduce the iss=
ue
>> with
>>> it.
>>>
>>> At your suggestion I added code to enumerate all terms in the indexes a=
nd
>>> there are no inconsistencies.
>>>
>>> Th
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org