lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From coldserenity <muntyanu.ro...@gmail.com>
Subject Search with wild-cards by words with forward-slash ("/")
Date Thu, 24 Sep 2009 17:49:46 GMT

Hello,
  
    I've been searching the forum and found several more or less relevant
topic listed below.
        
http://www.nabble.com/Parsing-text-containing-forward-slash-and-wildcard-td13541503.html#a13541503
http://www.nabble.com/Parsing-text-containing-forward-slash-and-wildcard-td13541503.html#a13541503

        
http://www.nabble.com/Search-that-supports-all-valid-characters-in-a-Unix-filename-td11495983.html
http://www.nabble.com/Search-that-supports-all-valid-characters-in-a-Unix-filename-td11495983.html

        
http://www.nabble.com/Payloads%2C-Tokenizers%2C-and-Filters.--Oh-My!-td13806662.html
http://www.nabble.com/Payloads%2C-Tokenizers%2C-and-Filters.--Oh-My!-td13806662.html 
    However they did not help me much and that's why I've decided to start a
separate topic.

    Basically what I need to do, is to be able to search '/' (forward slash)
symbol much the same as any alpha-numeric symbol AND with using wildcards.
E.g. if I have following records in my index
      1 |  /cat1
      2 |  /cat1/test
      3 |  /cat1/tea
      4 |  /cat2/test
    Search by
      /cat1* - will return records 1, 2, 3
      /cat1/test - will return record 2
      /cat1/t* - will return records 2, 3
      /cat1/t - will return no records

    I use Lucene with Compass. The searched property is indexed as
"untokenized".
    I have tried using StandardAnalyzer and KeywordAnalyzer but got the
following in the 

    Results from testing
org.apache.lucene.analysis.standard.StandardAnalyzer
        Search string:             rootKey:/cat1/te*
        Rewritten search string:   rootKey:/cat1/te*
        Found results          :   0

        Search string:             rootKey:/cat1/test
        Rewritten search string:   rootKey:"cat1 test"
        Found results          :   1

        Search string:             rootKey:/cat1/te
        Rewritten search string:   rootKey:"cat1 te"
        Found results          :   0

    Results from testing org.apache.lucene.analysis.KeywordAnalyzer
        Search string:             rootKey:/cat1/te
        Rewritten search string:   rootKey:/cat1/te
        Found results          :   0

        Search string:             rootKey:/cat1/te*
        Rewritten search string:   rootKey:/cat1/te*
        Found results          :   0

    So it looks like Lucene internally replaces the '/' with whitespace when
searching however I did not find any "replace symbol" code in the source
code of StandardAnalyzer, neither I had much luck in finding that place
during debugging the Lucene sources.
    
    With that all said, I would appreciate any hint on how this task could
be accomplished.
    
Regards,
  Roman
-- 
View this message in context: http://www.nabble.com/Search-with-wild-cards-by-words-with-forward-slash-%28%22-%22%29-tp25578017p25578017.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message