lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andre Rubin" <andre.ru...@gmail.com>
Subject MultiPhrase search
Date Mon, 25 Aug 2008 15:32:31 GMT
Hi all,

Let's say that I have in my index the value "One Two Three" for field 'A'.
I'm using a custom analyzer that is described in the forwarded message.

My Search query is built like this:

    QueryParser parser = new QueryParser(LABEL_FIELD, ANALYZER);
    Query query = parser.parse(LABEL_FIELD + ":" + searchQuery);

So when I search for

1) On*

I get a match with "One Two Three"

But when I search for

2) One Tw*

I get no matches. query.toString outputs:

3) label:One label:Tw*

which is obviously wrong. If I surround my search strng with ":

4) "One Tw*"

query.toString outputs what I was expecting:

5) label:One Tw*

but I still get no matches, and now even the first search (1) doesn't work.

Looking through the API I found this MultiPhraseQuery, but the doc was very
confusing. I tried it out but with no luck (I think I did it wrong). In any
case, is MultiPhraseQuery what I'm looking for? If it is, how should I use
the MultiPhraseQuery class?

Thanks,


Andre


---------- Forwarded message ----------
From: Andre Rubin <andre.rubin@gmail.com>
Date: Thu, Aug 21, 2008 at 2:21 AM
Subject: Re: Case Sensitivity
To: java-user@lucene.apache.org


Just to add to that, as I said before, in my case, I found more useful not
to use UN_Tokenized. Instead, I used Tokenized with a custom analyzer that
uses the KeywordTokenizer (entire input as only one token) with the
LowerCaseFilter: This way I get the best of both worlds.

public class KeywordLowerAnalyzer extends Analyzer {

    public KeywordLowerAnalyzer() {
    }


    public TokenStream tokenStream(String fieldName, Reader reader) {
        TokenStream result = new KeywordTokenizer(reader);
        result = new LowerCaseFilter(result);
        return result;
    }

}

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message