lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jochen Hebbrecht <jochenhebbre...@gmail.com>
Subject Re: Searching for a search string containing a literal slash doesn't work with QueryParser
Date Mon, 01 Oct 2012 15:33:25 GMT
Jack,

I wrote this custom analyzer:

----
@Override
protected TokenStreamComponents createComponents(String fieldName, Reader
reader) {
    final Tokenizer source = new WhitespaceTokenizer(matchVersion, reader);
    TokenStream sink = new LowerCaseFilter(matchVersion, source);
    return new TokenStreamComponents(source, sink);
}
----

I think this will do the trick too, right?

Jochen

2012/10/1 Jack Krupansky <jack@basetechnology.com>

> Sorry, I meant apply the filter to the TOKENIZER that the analyzer uses.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Jack Krupansky
> Sent: Monday, October 01, 2012 10:44 AM
>
> To: java-user@lucene.apache.org
> Subject: Re: Searching for a search string containing a literal slash
> doesn't work with QueryParser
>
> You can apply the lower case filter to the whitespace or other analyzer and
> use that as the analyzer.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Jochen Hebbrecht
> Sent: Monday, October 01, 2012 10:34 AM
> To: java-user@lucene.apache.org
> Subject: Re: Searching for a search string containing a literal slash
> doesn't work with QueryParser
>
> Hi Jack,
>
> I tried analyzing through WhitespaceAnalyzer. Now I can search on my query
> string AND I can find my document! Great!
> But all my searches are now case sensitive. So when I index a field as
> "JavaOne", I also have to enter in my search word: "JavaOne" and not
> "javaone" or "javaOne".
>
> How do you solve this in a proper way? Bringing all characters
> toLowerCase() when indexing them?
>
> Jochen
>
>
> 2012/10/1 Jack Krupansky <jack@basetechnology.com>
>
>  That's "The escape merely..."
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Jack Krupansky
>> Sent: Monday, October 01, 2012 9:58 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: Searching for a search string containing a literal slash
>> doesn't work with QueryParser
>>
>>
>> The scape merely assures that the slash will not be parsed as query syntax
>> and will be passed directly to the analyzer, but the standard analyzer
>> will
>> in fact always remove it. Maybe you want the white space analyzer or
>> keyword
>> analyzer (no characters removed.)
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Jochen Hebbrecht
>> Sent: Monday, October 01, 2012 8:59 AM
>> To: java-user@lucene.apache.org
>> Subject: Searching for a search string containing a literal slash doesn't
>> work with QueryParser
>>
>> Hi,
>>
>> I'm currently trying to search on the following search string in my Lucene
>> index: "2012/0.124.323".
>> The java code to search for ('value' is my search string)
>>
>> ----
>> QueryParser queryParser = new QueryParser(Version.LUCENE_36, field, new
>> StandardAnalyzer(Version.****LUCENE_36));
>> queryParser.****setAllowLeadingWildcard(true);
>> return queryParser.parse(value);
>> ----
>>
>> This returns a query result: "2012" "0.124.323". QueryParser is replacing
>> the forward slash by a space.
>> I tried escaping the "/" with a backslash "\", but this doesn't work
>> either.
>>
>> Maybe required to fully understand my scenario. I have the following
>> import
>> XML:
>>
>> ---
>> ...
>> <TEXT l="963" t="826" r="1391" b="870">Vervaldag </TEXT>
>> <TEXT l="963" t="826" r="1391" b="870">17/07/12</TEXT>
>> <TEXT l="2100" t="833" r="2275" b="871">09/07/12</TEXT>
>> <TEXT l="42" t="871" r="338" b="907">2012/0.124.323</TEXT>
>> <TEXT l="1478" t="938" r="1673" b="978">Kapitaals</TEXT>
>> ...
>> ---
>>
>> I get all TEXT values with an XPath expression and I index them as:
>>
>> ---
>> XPathExpression expr = xpath.compile("//TEXT");
>> Object result = expr.evaluate(document, XPathConstants.NODESET);
>> NodeList nodes = (NodeList) result;
>> for (int i = 0; i < nodes.getLength(); i++) {
>>    doc.add(new org.apache.lucene.document.****Field("IMAGE",
>> nodes.item(i).getFirstChild().****getNodeValue(), Store.NO,
>> Index.ANALYZED));
>> }
>> ---
>>
>> I'm using the StandardAnalyzer.
>>
>> What is the best way to solve my issue? Do I need to switch from Analyzer?
>> Do I have to use something else then QueryParser? ...
>> I also want to support searching on 2012/0.*, so I cannot only use
>> TermQuery ...
>>
>> Kind regards,
>> Jochen
>>
>>
>> ------------------------------****----------------------------**
>> --**---------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.****apache.org<
>> java-user-**unsubscribe@lucene.apache.org<java-user-unsubscribe@lucene.apache.org>
>> >
>> For additional commands, e-mail: java-user-help@lucene.apache.****org<
>> java-user-help@lucene.**apache.org <java-user-help@lucene.apache.org>>
>>
>> ------------------------------****----------------------------**
>> --**---------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.****apache.org<
>> java-user-**unsubscribe@lucene.apache.org<java-user-unsubscribe@lucene.apache.org>
>> >
>> For additional commands, e-mail: java-user-help@lucene.apache.****org<
>> java-user-help@lucene.**apache.org <java-user-help@lucene.apache.org>>
>>
>>
>>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message