lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: QueryParser in 3.x
Date Fri, 17 Sep 2010 19:31:03 GMT
On Fri, Sep 17, 2010 at 7:34 PM, Scott Smith <ssmith@mainstreamdata.com> wrote:
> First, let me say that I didn't think the problem was in QueryParser and I apologize
if that's how it sounded.  QueryParser is a central method to Lucene.  1 of me having problems
with QueryParser, 1000's of others not.  Is the problem more likely in my code or lucene.
 We'll all agree on the answer to that question.

Don't worry :)
>
> As further proof, I ran the following code.  The first part is from Simon's email (thanks
for that snippet) and the second part is from LIA2.
>
>        // code from Willnauer email
>        Analyzer a = new MyAnalyzer(Version.LUCENE_30);
>        TokenStream stream = a.reusableTokenStream("body", new StringReader("Europabörsen"));
>        TermAttribute attr = stream.addAttribute(TermAttribute.class);
>        while(stream.incrementToken())
>        {
>          System.out.println(attr.term());
>        }
>
>        // code from LIA2
>        stream = a.tokenStream("body", new StringReader("Europabörsen"));
>        TermAttribute term = stream.addAttribute(TermAttribute.class);
>        while (stream.incrementToken())
>        {
>            System.out.print(term.term());
>        }
>
>
> The answer I got back was:
> europabörsen
> europaborsen
>
> I realized the difference between these two was whether I was getting the reusableTokeStream
or the tokenStream.  In looking at my code, the ASCIIFoldingFilter was not in the filter
setup for the resusableTokenStream().  It was for the tokenStream().  I added it to the
reusableTokenStream and I now get the result I wanted.  The above code snippet generates
the word without the umlaut in both cases.  So, problem solved.
>
> Thanks to Simon for putting on the right track.
you are using lucene 3.0? If so take a look at ReusableAnalyzerBase
which makes it much easier to build Analyzers and prevents code
duplication.

simon
>
> Scott
>
>
> -----Original Message-----
> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
> Sent: Friday, September 17, 2010 1:03 AM
> To: java-user@lucene.apache.org
> Subject: Re: QueryParser in 3.x
>
> On Fri, Sep 17, 2010 at 1:06 AM, Scott Smith <ssmith@mainstreamdata.com> wrote:
>> I recently upgraded to Lucene 3.0 and am seeing some new behavior that I don't understand.
 Perhaps someone can explain why.
>>
>>
>>
>> I have a custom analyzer.  Part of the analyzer uses the AsciiFoldingFilter.  If
I run a word with an umlaut through that analyzer using the AnalyzerDemo code in LIA2, as
expected, I get the same word except that the umlauted letter is now a simple ascii letter
(no umlaut).  That's what I would expect and want.
>>
>>
>>
>> If I create a Queryparser using the call "new QueryParser(LUCENE_30, "body", myAnalyzer)
and then call the parse() method passing the same word, I can see that the query parser has
not removed the umlaut.  The string it has is "+body: Europabörsen".
>>
> This seems to be an issue with your analyzer rather than with the
> QueryParser. Since QueryParser didn't really change its behavior in
> 3.0 except of some default values. Can you provide more info what you
> did with your analyzer? Did you try running the term with umlaut chars
> through your Analyzer / Tokenstream directly? Something like that:
>
> Analyzer a = new MyAnalyzer();
> TokenStream stream = a.reusableTokenStream("body", new
> StringReader("Europabörsen"));
> TermAttribute attr = stream.addAttribute(TermAttribute.class);
> while(stream.incrementToken())
>  System.out.println(attr.term());
>
> simon
>>
>>
>> I know I had to make a number of changes to the analyzer and the tokenizer to upgrade
to 3.x.  Is there something very different from the 2.x version that I'm likely missing.
>>
>>
>>
>> Anyone have any thoughts?
>>
>>
>>
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message