lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: 2.0 and Tokenized versus UN_TOKENIZED
Date Sun, 05 Nov 2006 03:00:15 GMT
Two questions come to mind...

1> what analyzer are you using for the *query*? Is it possible that when you
query for city you're using a tokenizer that breaks up your city code?

2> what about case? I'll assume that you have tried to search one-word
cities, so how the stream is tokenized won't break the query places you
don't expect. But if you index Austin UN_TOKENZED, then search for it using,
say StandardAnalyzer, it'll look for austin and they won't match. This may
apply to Luke too. In Luke, you can choose a different analyzer
(WhitespaceAnalyzer comes to mind).

Hope this helps
Erick

On 11/4/06, James Rhodes <jrrhodes@gmail.com> wrote:
>
> I'm using the 2.0 branch and I've had issues with searching indexes where
> the fields aren't tokenized.
> For instance, my index consists of count,lastname,city,state and I used
> the
> following code to index it (the data is in a sql server db):
> *
>
> if*(count != 0) {
>
> doc.add(*new* Field("count", NumberUtils.*pad*(count),
> Field.*Store*.*YES*,
> Field.Index.*TOKENIZED*));
>
> }
>
> *if*(lastName != *null*) {
>
> doc.add(*new* Field("lastname", lastName, Field.Store.*YES*, Field.Index.*
> TOKENIZED*,Field.TermVector.*YES*));
>
> }
>
> *if*(city != *null*) {
>
> doc.add(*new* Field("city", city, Field.Store.*YES*, Field.Index.*UN_**
> TOKENIZED*));
>
> }
>
> *if*(state != *null*) {
>
> *doc*.add(*new* Field("*state*", state, Field.Store.*YES*, Field.Index.*
> TOKENIZED*));
>
> }
>
> *Using this code I can search by any field with my app EXCEPT city, though
> I
> see it in the index using Luke.  I also can't search for it using Luke.
> When
> I add Field.Index.TOKENIZED  to the city field, I can search by it fine.*
>
> *Is this normal behavior? This doesn't make sense to me. Tokenized should
> prevent me from searching unless I'm missing something. Any ideas?
> Thanks!*
>
> *B*
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message