lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ricardo Lopes <A...@alunos.ipca.pt>
Subject Re: Does QueryParser uses Analyzer ?
Date Tue, 30 Nov 2004 19:29:56 GMT
i was using an adaptation of the SearchFiles class distibuted in the 
demo (demo.org.apache.lucene.demo.SearchFiles)
The Analyzer is the BrazilianAnalyzer avaliable in the sandbox 
(org.apache.lucene.analysis.br.BrazilianAnalyzer)**

 > My guess is that your analyzer is what did the splitting

After looker with more attetion to the code i found that the tokenStream 
method in the BrazilianAnalyzer calls the StandardTokenizer and is this 
the one that split the search string, is there a simple way of subclass 
the tokenizer to avoid splitting those characters or do i have make a 
custom implementation of that class.

though it could be something fishy in how you got the string into 
QueryParser in the first place?

As this only happends when i make a search (during indexing the 
splitting of those characters doesn't happend) i thought that i had to 
do with the QueryParser, but it seems that the problem is with the 
StandardTokenizer.

Thanks

Erik Hatcher wrote:

> On Nov 30, 2004, at 10:42 AM, Ricardo Lopes wrote:
>
>> Does the QueryParser class really uses the Analyzer passed to the 
>> parse method ?
>
>
> Absolutely.
>
>> I look at the code and i dont the object beeing used anywhere in the 
>> class. The problem is that i am writting an application with lucene 
>> that searches using a foreign language with latin characters, the 
>> indexing works fine, but the search aparently doesn't call the Analyzer.
>
>
> look at the getFieldQuery method.  It uses it to extract the tokens 
> from each part of the query (phrases and stand-alone terms).
>
>> Here is an example:
>> i have a file that contains the following word: memória
>> if i search for: memoria (without the puntuation charecter in the o) 
>> it finds the word, which is correct
>> if i search for: memória (the exact same word) it doesn't find the 
>> word, because the QueryParser splits the word to "mem ria", but if 
>> the analyzer were called the "ó" would be replaced to "o". I guess 
>> the analyzer isn't called, is this right?
>
>
> What Analyzer are you using?  My guess is that your analyzer is what 
> did the splitting, though it could be something fishy in how you got 
> the string into QueryParser in the first place?
>
>     Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message