lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Fertig" <dfer...@cymfony.com>
Subject RE: Search returning documents matching a NOT range
Date Fri, 05 Nov 2010 21:18:19 GMT
Ian,
Thank you for getting back to me.  No, I do not get a bogus hit from searching the small index
alone.  Also, I do not get a hit if I delete any more documents from the larger index.

I have updated my test to use RamDirectory and also print maxDoc() for the searchables and
the searcher, all numbers are as expected.  I have posted all the code, but did not want to
post the indexes due to their size (2.2 meg uncompressed).  I will mail them to anyone who
can help.

Here is the complete latest test code and its output



public class LuceneTest {
    static public void main(String[] args) {
        try {
            QueryParser queryParser = new QueryParser(Version.LUCENE_30, "author", new KeywordAnalyzer());
            Query query = queryParser.parse("author:bentalcella AND NOT publish_date:[20100601
TO 20100630]");
            Searchable[] searchables = new Searchable[2];
            RAMDirectory ram1 = new RAMDirectory(new NIOFSDirectory(new File("/home/dfertig/testIndexes/b1")));
            RAMDirectory ram2 = new RAMDirectory(new NIOFSDirectory(new File("/home/dfertig/testIndexes/m1")));
            searchables[0] = new IndexSearcher(ram1, true);
            searchables[1] = new IndexSearcher(ram2, true);
            MultiSearcher searcher = new MultiSearcher(searchables);
            System.out.println("MaxDocs for index 1: " + searchables[0].maxDoc());
            System.out.println("MaxDocs for index 2: " + searchables[1].maxDoc());
            System.out.println("MaxDocs for MultiSearcher: " + searcher.maxDoc());
            System.out.println("Query: " + query.toString());
            TopDocs topDocs = searcher.search(query, 10);
            System.out.println("Results: " + topDocs.totalHits);
            for (int in = 0; in < topDocs.totalHits; in++) {
                Document document = searcher.doc(topDocs.scoreDocs[in].doc);
                System.out.println("publish_date: " + document.get("publish_date"));
            }
            searcher.close();
            ram1.close();
            ram2.close();
        } catch (Exception e) {
            System.out.println(e.getMessage());
            e.printStackTrace();
        }
    }
}

Output:
MaxDocs for index 1: 1
MaxDocs for index 2: 1000
MaxDocs for MultiSearcher: 1001
Query: +author:bentalcella -publish_date:[20100601 TO 20100630]
Results: 1
publish_date: 20100606




-----Original Message-----
From: Ian Lea [mailto:ian.lea@gmail.com] 
Sent: Friday, November 5, 2010 4:57 PM
To: java-user@lucene.apache.org
Subject: Re: Search returning documents matching a NOT range

Do you get the bogus hit on the small index if search that index
alone?  Are you positive it only holds the one doc? Loading the one
doc into a new RAM based index in the test would prove it.

You are more likely to get help if post a self-contained example -
people can see everything relevant and are more likely to spot a
problem.


--
Ian.


On Thu, Nov 4, 2010 at 4:52 PM, David Fertig <dfertig@cymfony.com> wrote:
> I have an active lucene implementation that has been in place for a
> couple years and was recently upgraded to the 3.02 branch. We are now
> occasionally seeing documents returned from searches that should not be
> returned. I have reduced the code and indexes to the smallest set
> possible where I can still repeat the issue.
>
>
>
> My test cases uses 2 indexes.  These indexes have been rebuilt/optimized
> using Luke 1.0.1 to make them the smallest possible.  One index has 1
> document, which is being returned by the query but should not.   The
> other index has 1000 documents, none of which match the search criteria.
> The query should bring back 0 results, but brings back 1.  I can zip and
> mail the indexes if it would aid in helping track down this issue.
>
>
>
>
>
>
>
> public class LuceneTest {
>
>    static public void main(String[] args) {
>
>        try {
>
>            QueryParser queryParser = new QueryParser(Version.LUCENE_30,
> "author", new KeywordAnalyzer());
>
>            Query query = queryParser.parse("author:bentalcella AND NOT
> publish_date:[20100601 TO 20100630]");
>
>            Searchable[] searchables = new Searchable[2];
>
>            searchables[0] = new IndexSearcher(new NIOFSDirectory(new
> File("/home/dfertig/testIndexes/b1")), true);
>
>            searchables[1] = new IndexSearcher(new NIOFSDirectory(new
> File("/home/dfertig/testIndexes/m1")), true);
>
>            Searcher searcher = new MultiSearcher(searchables);
>
>            System.out.println("Query: " + query.toString());
>
>            TopDocs topDocs = searcher.search(query, 10);
>
>            System.out.println("Results: " + topDocs.totalHits);
>
>            for (int in = 0; in < topDocs.totalHits; in++) {
>
>                Document document =
> searcher.doc(topDocs.scoreDocs[in].doc);
>
>                System.out.println("publish_date: " +
> document.get("publish_date"));
>
>            }
>
>            searcher.close();
>
>        } catch (Exception e) {
>
>            System.out.println(e.getMessage());
>
>            e.printStackTrace();
>
>        }
>
>    }
>
> }
>
>
>
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message