Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 6379 invoked from network); 19 Apr 2002 18:24:09 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 19 Apr 2002 18:24:09 -0000 Received: (qmail 3609 invoked by uid 97); 19 Apr 2002 18:24:09 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 3593 invoked by uid 97); 19 Apr 2002 18:24:09 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 3582 invoked from network); 19 Apr 2002 18:24:08 -0000 Message-ID: <20020419182408.95224.qmail@web12702.mail.yahoo.com> Date: Fri, 19 Apr 2002 11:24:08 -0700 (PDT) From: Otis Gospodnetic Subject: RE: Wildcard Searching To: Lucene Users List Cc: "Howk, Michael" In-Reply-To: <552E81EFEDA8D3118AA7009027DE057B0157BFD3@fsc_exchange.fsc.follett.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="0-608738933-1019240648=:94638" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N --0-608738933-1019240648=:94638 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Did the change that you mentioned below really work for you? I wrote this class: http://nagoya.apache.org/bugzilla/showattachment.cgi?attach_id=1638 and it looks like the bug is not in QueryParser, but in some Java class (could it be WildcardTermEnum?), since the class does not make use of QueryParser and still demonstrates that WildcardQuery doesn't work properly. Thanks, Otis --- "Howk, Michael" wrote: > We just tried adding the "?" character to QueryParser.jj under > <#_TERM_START_CHAR>. We noticed that the "*" was in that list, so we > figured > we'd just give it a try. It seems to have worked. Now when we search > on > rou?d, we get hits on the word "round". We're going to try searching > for > some other variations to make sure that we've done the right thing. > > We'd still be interested to know exactly why this worked (assuming it > continues to solve our problem). What is a TERM_START_CHAR and how is > it > used? Obviously it does something important. :-) > > -----Original Message----- > From: Howk, Michael [mailto:MHowk@FSC.Follett.com] > Sent: Wednesday, February 27, 2002 11:14 AM > To: 'Lucene Users List' > Subject: RE: Wildcard Searching > > > The StandardAnalyzer uses a lowercase filter, but we tried indexing > "the > round hat", just to make sure. The * still worked, but the ? still > failed. > > We noticed that the ? character is listed in the QueryParser as a > WILDTERM. > But after that, the code heads into the WildcardQuery class, and we > get lost > amidst "setEnum()" and "wildcardEquals()" stuff. :-) > > Seriously though, we're using the StandardAnalyzer directly from > Lucene. I > suppose it's possible that the ? is a special character that's > getting > stripped out. But we need help to find out exactly where the special > characters are defined or filtered. > > Michael > > -----Original Message----- > From: Aruna Raghavan [mailto:ArunaR@opin.com] > Sent: Wednesday, February 27, 2002 11:00 AM > To: 'Lucene Users List' > Subject: RE: Wildcard Searching > > > >From my experience with wildcards, > 1. They are case sensitive while the regular queries aren't. > 2. Only one wild card is allowed in a word. If you are using this > with a > bool query, you can use something like the following > (asas*) AND (fhg*fd). This is acceptable > 3. There is a requirement of using atleast one character before > wildcard in > a query.(*fhhd is not valid) > 4. Special characters are not supported (? may be a special > character) > Hope this helps! > > -----Original Message----- > From: Howk, Michael [mailto:MHowk@FSC.Follett.com] > Sent: Wednesday, February 27, 2002 10:56 AM > To: Lucene Mailing List (E-mail) > Subject: Wildcard Searching > > > We're really struggling with trying to understand why the > WildcardQuery > seems to strip out the question mark by replacing it with a space. > We're > using the daily build, and a StandardAnalyzer. We've got the text > "The Round > Window" in our index. If we search on "roun*" the Lucene QueryParser > returns > a hit. When we search on "roun?", we don't get any hits. We don't > even know > how to make heads or tails of the WildcardQuery or WildcardTermEnum > classes. > > Also, Lucene returns the parsed version of each of our searches. When > we > search by rou*d, Lucene parses it as rou*d (which is what we would > expect). > But when we search by rou?d, Lucene parses it as "rou d". It seems to > wrap > the term in quotes and replace the question mark with a space. Any > ideas? Or > can someone give us an idea of how to understand WildcardQuery or > WildcardTermEnum? > > Michael > > -- > To unsubscribe, e-mail: > > For additional commands, e-mail: > > > -- > To unsubscribe, e-mail: > > For additional commands, e-mail: > > > -- > To unsubscribe, e-mail: > > For additional commands, e-mail: > > > -- > To unsubscribe, e-mail: > > For additional commands, e-mail: > > __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ --0-608738933-1019240648=:94638 Content-Type: application/octet-stream; name="WildcardQuestionmarkTest.java" Content-Transfer-Encoding: base64 Content-Description: WildcardQuestionmarkTest.java Content-Disposition: attachment; filename="WildcardQuestionmarkTest.java" aW1wb3J0IG9yZy5hcGFjaGUubHVjZW5lLnNlYXJjaC4qOwppbXBvcnQgb3Jn LmFwYWNoZS5sdWNlbmUuaW5kZXguVGVybTsKaW1wb3J0IG9yZy5hcGFjaGUu bHVjZW5lLmluZGV4LkluZGV4V3JpdGVyOwppbXBvcnQgb3JnLmFwYWNoZS5s dWNlbmUuc3RvcmUuUkFNRGlyZWN0b3J5OwppbXBvcnQgb3JnLmFwYWNoZS5s dWNlbmUuYW5hbHlzaXMuU2ltcGxlQW5hbHl6ZXI7CmltcG9ydCBvcmcuYXBh Y2hlLmx1Y2VuZS5kb2N1bWVudC5Eb2N1bWVudDsKaW1wb3J0IG9yZy5hcGFj aGUubHVjZW5lLmRvY3VtZW50LkZpZWxkOwppbXBvcnQgamF2YS5pby5JT0V4 Y2VwdGlvbjsKCnB1YmxpYyBjbGFzcyBXaWxkY2FyZFF1ZXN0aW9ubWFya1Rl c3QKewoKICAgIHB1YmxpYyBzdGF0aWMgdm9pZCBtYWluIChTdHJpbmdbXSBh cmdzKQogICAgICAgIHRocm93cyBJT0V4Y2VwdGlvbgogICAgewogICAgICAg IG5ldyBXaWxkY2FyZFF1ZXN0aW9ubWFya1Rlc3QoKTsKICAgIH0KCiAgICBw dWJsaWMgV2lsZGNhcmRRdWVzdGlvbm1hcmtUZXN0KCkKCXRocm93cyBJT0V4 Y2VwdGlvbgogICAgewogICAgICAgIFJBTURpcmVjdG9yeSBpbmRleFN0b3Jl ID0gbmV3IFJBTURpcmVjdG9yeSgpOwogICAgICAgIEluZGV4V3JpdGVyIHdy aXRlciA9IG5ldyBJbmRleFdyaXRlcihpbmRleFN0b3JlLCBuZXcgU2ltcGxl QW5hbHl6ZXIoKSwgdHJ1ZSk7IAogICAgICAgIERvY3VtZW50IGRvYzEgPSBu ZXcgRG9jdW1lbnQoKTsKICAgICAgICBEb2N1bWVudCBkb2MyID0gbmV3IERv Y3VtZW50KCk7CiAgICAgICAgRG9jdW1lbnQgZG9jMyA9IG5ldyBEb2N1bWVu dCgpOwogICAgICAgIERvY3VtZW50IGRvYzQgPSBuZXcgRG9jdW1lbnQoKTsK CWRvYzEuYWRkKEZpZWxkLlRleHQoImJvZHkiLCAibWV0YWwiKSk7CiAgICAg ICAgZG9jMi5hZGQoRmllbGQuVGV4dCgiYm9keSIsICJtZXRhbHMiKSk7CiAg ICAgICAgZG9jMy5hZGQoRmllbGQuVGV4dCgiYm9keSIsICJtWHRhbHMiKSk7 CiAgICAgICAgZG9jNC5hZGQoRmllbGQuVGV4dCgiYm9keSIsICJtWHRYbHMi KSk7CiAgICAgICAgd3JpdGVyLmFkZERvY3VtZW50KGRvYzEpOwogICAgICAg IHdyaXRlci5hZGREb2N1bWVudChkb2MyKTsKICAgICAgICB3cml0ZXIuYWRk RG9jdW1lbnQoZG9jMyk7CiAgICAgICAgd3JpdGVyLmFkZERvY3VtZW50KGRv YzQpOwoJd3JpdGVyLm9wdGltaXplKCk7CglJbmRleFNlYXJjaGVyIHNlYXJj aGVyID0gbmV3IEluZGV4U2VhcmNoZXIoaW5kZXhTdG9yZSk7CglRdWVyeSBx dWVyeTEgPSBuZXcgVGVybVF1ZXJ5KG5ldyBUZXJtKCJib2R5IiwgIm0/dGFs IikpOyAgICAgICAvLyAxCiAgICAgICAgUXVlcnkgcXVlcnkyID0gbmV3IFdp bGRjYXJkUXVlcnkobmV3IFRlcm0oImJvZHkiLCAibWV0YWw/IikpOyAgLy8g MgogICAgICAgIFF1ZXJ5IHF1ZXJ5MyA9IG5ldyBXaWxkY2FyZFF1ZXJ5KG5l dyBUZXJtKCJib2R5IiwgIm1ldGFscz8iKSk7IC8vIDEKICAgICAgICBRdWVy eSBxdWVyeTQgPSBuZXcgV2lsZGNhcmRRdWVyeShuZXcgVGVybSgiYm9keSIs ICJtP3Q/bHMiKSk7ICAvLyAzCglIaXRzIHJlc3VsdHMxID0gc2VhcmNoZXIu c2VhcmNoKHF1ZXJ5MSk7CiAgICAgICAgSGl0cyByZXN1bHRzMiA9IHNlYXJj aGVyLnNlYXJjaChxdWVyeTIpOwogICAgICAgIEhpdHMgcmVzdWx0czMgPSBz ZWFyY2hlci5zZWFyY2gocXVlcnkzKTsKICAgICAgICBIaXRzIHJlc3VsdHM0 ID0gc2VhcmNoZXIuc2VhcmNoKHF1ZXJ5NCk7CglTeXN0ZW0ub3V0LnByaW50 bG4oIlNlYXJjaGluZyBmb3IgbT90YWwgZ290ICIgKyByZXN1bHRzMS5sZW5n dGgoKSArICIgcmVzdWx0cy4iKTsKICAgICAgICBTeXN0ZW0ub3V0LnByaW50 bG4oIlNlYXJjaGluZyBmb3IgbWV0YWw/IGdvdCAiICsgcmVzdWx0czIubGVu Z3RoKCkgKyAiIHJlc3VsdHMuIik7CiAgICAgICAgU3lzdGVtLm91dC5wcmlu dGxuKCJTZWFyY2hpbmcgZm9yIG1ldGFscz8gZ290ICIgKyByZXN1bHRzMy5s ZW5ndGgoKSArICIgcmVzdWx0cy4iKTsKICAgICAgICBTeXN0ZW0ub3V0LnBy aW50bG4oIlNlYXJjaGluZyBmb3IgbT90P2xzIGdvdCAiICsgcmVzdWx0czQu bGVuZ3RoKCkgKyAiIHJlc3VsdHMuIik7Cgl3cml0ZXIuY2xvc2UoKTsKICAg IH0KfQo= --0-608738933-1019240648=:94638 Content-Type: text/plain; charset=us-ascii -- To unsubscribe, e-mail: For additional commands, e-mail: --0-608738933-1019240648=:94638--