lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neil Despain (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-733) problems with some non word ascii characters in searchs
Date Wed, 29 Nov 2006 20:32:21 GMT
problems with some non word ascii characters in searchs
-------------------------------------------------------

                 Key: LUCENE-733
                 URL: http://issues.apache.org/jira/browse/LUCENE-733
             Project: Lucene - Java
          Issue Type: Bug
          Components: QueryParser, Search
            Reporter: Neil Despain


Here are a number of examples of searches that are not acting as I would expect.

1.
---------
I have a document with the text:
Smith, Bob

1.a
If I do a search:
Smith,~0.9 Bob~0.9

MultiPhraseQueryParser.parse(term) returns a query for:
content:smith,~0.9 content:bob~0.9

But it only gets a hit on: Bob


1.b
If I do this search:
"Smith,~0.9 Bob~0.9"~1

MultiPhraseQueryParser.parse(term) returns a query for:
content:"bob"~1

and it also only returns a hit for: Bob

In both cases words that end with a comma are not found. (other characters have the same affect
as commas)

=========


2.
---------

For a document with phone numbers:
2124225100
212 422 5100
212-422-5100
(212) 422-5100
(212)4225100
(212)422-5100
(212) 422.5100
(212) 422 5100
212.422.5100
212.422-5100


2.a
If I do a search:
212*422*5100~0.9

MultiPhraseQueryParser.parse(term) returns a query for:
content:"(212.422-5100 212-422-5100 2124225100 212.422.5100)"

I do not get a match on 212)422-5100 -- Doesn't find anything that starts with (212)...


2.b
Search term:
212*422*5100

MultiPhraseQueryParser.parse(term) returns a query for:
content:212*422*5100

and does not match 212)422-5100 -- Doesn't find anything that starts with (212)...


2.c
If I try to work around that by searching with proximity for:
"212 422*5100"~1

MultiPhraseQueryParser.parse(term) returns a query for:
content:"(422-5100 422.5100 4225100)"~1

and again does not find anything with (212)... like (212) 422-5100 or (212)422-5100
=========

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message