lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Rhodes" <jrrho...@gmail.com>
Subject Re: 2.0 and Tokenized versus UN_TOKENIZED
Date Sun, 05 Nov 2006 13:11:50 GMT
Ahh. That makes sense and helps a lot. Thanks so much for the replies. I'm
sure I'll have more noob questions later today.

B


On 11/5/06, Chris Hostetter <hossman_lucene@fucit.org> wrote:
>
>
> what Erick was saying is that if you use an Analyzer to build your
> queries, that Analyzer has no way of knowing that the city filed wasn't
> tokenized.
>
> your queries work when you tokenize city because the same analyzer is used
> at index time and at query time ... when you don't tokenize, no analyzer
> is used, so the value in the index is something like...
>
>        EAGLE RIVER
>
> but when you ask QueryParser to give you a Query object for city:"EAGLE
> RIVER" it uses the Analyzer you told it to use and makes a phrase query on
> "eagle" and "river" (or maybe something like "eagl" and "rivr" ... I don't
> remember if it has stemming by default)
>
> if you want it to work iwthout tokenizing, you need to use something like
> hte PerFieldAnalyzerWrapper, and the KeywordAnalyzer for hte city field
> ... the KeywordAnalyzer at query time will leave the query text
> untokenized so it can match the untokenized value you indexed.
>
> : Date: Sat, 4 Nov 2006 22:18:13 -0500
> : From: James Rhodes <jrrhodes@gmail.com>
> : Reply-To: java-user@lucene.apache.org
> : To: java-user@lucene.apache.org
> : Subject: Re: 2.0 and Tokenized versus UN_TOKENIZED
> :
> : Thanks. That helps, but I've tried a lot of combinations and I forget
> now.
> : I'm using StandardAnalyzer for the index and query.I can't say for sure
> if
> : I've tried other cases. The specific combination is lastname:rhodes AND
> : city:"EAGLE RIVER" AND state:AK, Before TOKENIZED no match after
> TOKENIZED
> : match. Is there something special I need to do to ensure that EAGLE
> RIVER is
> : kept in the same field? I'm a newbie, admittedly, but I've learned a lot
> : since Friday. Thanks for the help.
> :
> : B
> :
> :
> : On 11/4/06, Erick Erickson <erickerickson@gmail.com> wrote:
> : >
> : > Two questions come to mind...
> : >
> : > 1> what analyzer are you using for the *query*? Is it possible that
> when
> : > you
> : > query for city you're using a tokenizer that breaks up your city code?
> : >
> : > 2> what about case? I'll assume that you have tried to search one-word
> : > cities, so how the stream is tokenized won't break the query places
> you
> : > don't expect. But if you index Austin UN_TOKENZED, then search for it
> : > using,
> : > say StandardAnalyzer, it'll look for austin and they won't match. This
> may
> : > apply to Luke too. In Luke, you can choose a different analyzer
> : > (WhitespaceAnalyzer comes to mind).
> : >
> : > Hope this helps
> : > Erick
> : >
> : > On 11/4/06, James Rhodes <jrrhodes@gmail.com> wrote:
> : > >
> : > > I'm using the 2.0 branch and I've had issues with searching indexes
> : > where
> : > > the fields aren't tokenized.
> : > > For instance, my index consists of count,lastname,city,state and I
> used
> : > > the
> : > > following code to index it (the data is in a sql server db):
> : > > *
> : > >
> : > > if*(count != 0) {
> : > >
> : > > doc.add(*new* Field("count", NumberUtils.*pad*(count),
> : > > Field.*Store*.*YES*,
> : > > Field.Index.*TOKENIZED*));
> : > >
> : > > }
> : > >
> : > > *if*(lastName != *null*) {
> : > >
> : > > doc.add(*new* Field("lastname", lastName, Field.Store.*YES*,
> : > Field.Index.*
> : > > TOKENIZED*,Field.TermVector.*YES*));
> : > >
> : > > }
> : > >
> : > > *if*(city != *null*) {
> : > >
> : > > doc.add(*new* Field("city", city, Field.Store.*YES*,
> Field.Index.*UN_**
> : > > TOKENIZED*));
> : > >
> : > > }
> : > >
> : > > *if*(state != *null*) {
> : > >
> : > > *doc*.add(*new* Field("*state*", state, Field.Store.*YES*,
> Field.Index.*
> : > > TOKENIZED*));
> : > >
> : > > }
> : > >
> : > > *Using this code I can search by any field with my app EXCEPT city,
> : > though
> : > > I
> : > > see it in the index using Luke.  I also can't search for it using
> Luke.
> : > > When
> : > > I add Field.Index.TOKENIZED  to the city field, I can search by it
> : > fine.*
> : > >
> : > > *Is this normal behavior? This doesn't make sense to me. Tokenized
> : > should
> : > > prevent me from searching unless I'm missing something. Any ideas?
> : > > Thanks!*
> : > >
> : > > *B*
> : > >
> : > >
> : >
> : >
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message