lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roberto Fonti <roberto.fo...@buongiorno.com>
Subject Re: UN_TOKENIZED and StandardAnalyzer
Date Fri, 06 Apr 2007 15:53:18 GMT
Thanks Erick for your help.
Actually I was already using Luke!
The only thing I was missing was the possibility of using different 
Analyzers at the same time, with PerFieldAnalyzerWrapper.

Thank you again.
Best,
Roberto

Erick Erickson ha scritto:
> Really, really, really get a copy of Luke. Really. Use it to open
> your index and run experimental queries, especially to see
> how they get rewritten (but be sure to pick the appropriate
> analyzer).
>
> Google "lucene luke". Really, really get a copy. It'll help you
> make MUCH faster progress than waiting for someone to
> get around to answering posts here <G>....
>
> Then use query.toString() to see how things pars in your programs
> (or use Luke's search tab to see similar).
>
> If you do, you'll see that parsing the string "winter sport" becomes
> CATEGORY:winter CATEGORY:sport
>
> using StandardAnalyzer with the default operator OR will not
> match an untokenized field of "winter category".
> AND won't work either, it's an UN_TOKENIZED field
>
> Using KeywordAnzlyer gives
> CATEGORY:winter CATEGORY:sport
> This does NOT match the document
> I indexed UN_TOKENIZED 'winter sport'. What the heck?
>
> But using KeywordAnalyzer with
> "\"winter sport\""
>
> That is, where the double quotes are port of the query and using
> KeywordAnalyzer gives
> CATEGORY:winter sport
> and produces the correct match on my UN_TOKENIZED field. What's
> happening here is the double quotes cause the parser to pass both
> words as a single token rather than two tokens.
>
> This has been a source of some confusion on the list, there
> is a very long thread you'll find if you search the archives
> for KeywordAnalyzer.
>
>
> Best
> Erick
>
> On 4/6/07, Roberto Fonti <roberto.fonti@buongiorno.com> wrote:
>>
>> Hi All,
>> I'm indexing categories with this code:
>>
>> for (Category category : item.getCategories()) {
>>     lucene_doc.add(new Field(
>>         "CATEGORY",
>>         category.getName(),
>>         Field.Store.NO,
>>         Field.Index.UN_TOKENIZED));
>> }
>>
>> And searching using the query:
>>
>> String query = "CATEGORY:("+category.getName()+")";
>>
>> I've configured to use the StandardAnalyzer both in the IndexWriter for
>> the QueryParser.
>>
>> Everything goes fine BUT with categories that contains whitespaces (or
>> other chars that get tokenized).
>>
>> * If category is "sport" - ok, I get the result from the search
>> * If category is "winter sport" - I get no result from search
>>
>> I've tried with a number of search syntax:
>> +CATEGORY:"winter sport"
>> +CATEGORY:winter +CATEGORY:sport
>> +CATEGORY:(winter sport)
>> and other...
>> but none of them work.
>>
>> What's wrong with that?
>> By the way, using the KeywordAnalyzer it works, but it is not the
>> correct analyzer for my application.
>> Shouldn't the Analyzer be ignored for a Field.Index.UN_TOKENIZED field?
>>
>> Thanks,
>> Roberto
>>
>>
>>
>

-- 
Roberto Fonti
Technology Group
Buongiorno SpA

roberto.fonti@buongiorno.com
http://www.buongiorno.com
 
Il Girasole, Palazzo Marco Polo 324
20084 Lacchiarella, Milano ITALY


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message