lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Trejkaz (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2348) DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers
Date Wed, 02 Jun 2010 01:55:38 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874373#action_12874373
] 

Trejkaz commented on LUCENE-2348:
---------------------------------

I attempted to make a test but it fails with matching 0 instead of matching 2 like I would
have expected.  Here is the code:

{code:java}
    @Test
    public void testDuplicateFilterAcrossSegments() throws Exception
    {
        RAMDirectory index1Dir = new RAMDirectory();
        addDoc(index1Dir);

        RAMDirectory index2Dir = new RAMDirectory();
        addDoc(index2Dir);

        IndexReader reader1 = IndexReader.open(index1Dir, true);
        IndexReader reader2 = IndexReader.open(index2Dir, true);

        IndexReader multi = new MultiReader(new IndexReader[] { reader1, reader2 });
        IndexSearcher searcher = new IndexSearcher(multi);

        TopDocs docs;

        docs = searcher.search(new MatchAllDocsQuery(), null, 10);
        assertEquals("Should only be two hits without the filter (just checking)", 2, docs.totalHits);

        docs = searcher.search(new MatchAllDocsQuery(), new DuplicateFilter("id"), 10);
        assertEquals("Should only be one hit because the second was a duplicate", 1, docs.totalHits);
    }

    private void addDoc(Directory dir) throws IOException
    {
        IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.UNLIMITED);
        try
        {
            Document doc = new Document();
            doc.add(new Field("id", "1", Field.Store.YES, Field.Index.NO));
            writer.addDocument(doc);
            writer.commit();
        }
        finally
        {
            writer.close();
        }
    }
{code}


> DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2348
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2348
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>    Affects Versions: 2.9.2
>            Reporter: Trejkaz
>
> DuplicateFilter currently works by building a single doc ID set, without taking into
account that getDocIdSet() will be called once per segment and only with each segment's local
reader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message