lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Trejkaz (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-1683) RegexQuery matches terms the input regex doesn't actually match
Date Thu, 11 Jun 2009 01:43:07 GMT
RegexQuery matches terms the input regex doesn't actually match
---------------------------------------------------------------

                 Key: LUCENE-1683
                 URL: https://issues.apache.org/jira/browse/LUCENE-1683
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/*
    Affects Versions: 2.3.2
            Reporter: Trejkaz


I was writing some unit tests for our own wrapper around the Lucene regex classes, and got
tripped up by something interesting.

The regex "cat." will match "cats" but also anything with "cat" and 1+ following letters (e.g.
"cathy", "catcher", ...)  It is as if there is an implicit .* always added to the end of the
regex.

Here's a unit test for the behaviour I would expect myself:

    @Test
    public void testNecessity() throws Exception {
        File dir = new File(new File(System.getProperty("java.io.tmpdir")), "index");
        IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(), true);
        try {
            Document doc = new Document();
            doc.add(new Field("field", "cat cats cathy", Field.Store.YES, Field.Index.TOKENIZED));
            writer.addDocument(doc);
        } finally {
            writer.close();
        }

        IndexReader reader = IndexReader.open(dir);
        try {
            TermEnum terms = new RegexQuery(new Term("field", "cat.")).getEnum(reader);
            assertEquals("Wrong term", "cats", terms.term());
            assertFalse("Should have only been one term", terms.next());
        } finally {
            reader.close();
        }
    }

This test fails on the term check with terms.term() equal to "cathy".

Our workaround is to mangle the query like this:

    String fixed = String.format("(?:%s)$", original);


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message