lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: 2.0 and Tokenized versus UN_TOKENIZED
Date Sun, 05 Nov 2006 09:41:37 GMT

what Erick was saying is that if you use an Analyzer to build your
queries, that Analyzer has no way of knowing that the city filed wasn't
tokenized.

your queries work when you tokenize city because the same analyzer is used
at index time and at query time ... when you don't tokenize, no analyzer
is used, so the value in the index is something like...

	EAGLE RIVER

but when you ask QueryParser to give you a Query object for city:"EAGLE
RIVER" it uses the Analyzer you told it to use and makes a phrase query on
"eagle" and "river" (or maybe something like "eagl" and "rivr" ... I don't
remember if it has stemming by default)

if you want it to work iwthout tokenizing, you need to use something like
hte PerFieldAnalyzerWrapper, and the KeywordAnalyzer for hte city field
... the KeywordAnalyzer at query time will leave the query text
untokenized so it can match the untokenized value you indexed.

: Date: Sat, 4 Nov 2006 22:18:13 -0500
: From: James Rhodes <jrrhodes@gmail.com>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Re: 2.0 and Tokenized versus UN_TOKENIZED
:
: Thanks. That helps, but I've tried a lot of combinations and I forget now.
: I'm using StandardAnalyzer for the index and query.I can't say for sure if
: I've tried other cases. The specific combination is lastname:rhodes AND
: city:"EAGLE RIVER" AND state:AK, Before TOKENIZED no match after TOKENIZED
: match. Is there something special I need to do to ensure that EAGLE RIVER is
: kept in the same field? I'm a newbie, admittedly, but I've learned a lot
: since Friday. Thanks for the help.
:
: B
:
:
: On 11/4/06, Erick Erickson <erickerickson@gmail.com> wrote:
: >
: > Two questions come to mind...
: >
: > 1> what analyzer are you using for the *query*? Is it possible that when
: > you
: > query for city you're using a tokenizer that breaks up your city code?
: >
: > 2> what about case? I'll assume that you have tried to search one-word
: > cities, so how the stream is tokenized won't break the query places you
: > don't expect. But if you index Austin UN_TOKENZED, then search for it
: > using,
: > say StandardAnalyzer, it'll look for austin and they won't match. This may
: > apply to Luke too. In Luke, you can choose a different analyzer
: > (WhitespaceAnalyzer comes to mind).
: >
: > Hope this helps
: > Erick
: >
: > On 11/4/06, James Rhodes <jrrhodes@gmail.com> wrote:
: > >
: > > I'm using the 2.0 branch and I've had issues with searching indexes
: > where
: > > the fields aren't tokenized.
: > > For instance, my index consists of count,lastname,city,state and I used
: > > the
: > > following code to index it (the data is in a sql server db):
: > > *
: > >
: > > if*(count != 0) {
: > >
: > > doc.add(*new* Field("count", NumberUtils.*pad*(count),
: > > Field.*Store*.*YES*,
: > > Field.Index.*TOKENIZED*));
: > >
: > > }
: > >
: > > *if*(lastName != *null*) {
: > >
: > > doc.add(*new* Field("lastname", lastName, Field.Store.*YES*,
: > Field.Index.*
: > > TOKENIZED*,Field.TermVector.*YES*));
: > >
: > > }
: > >
: > > *if*(city != *null*) {
: > >
: > > doc.add(*new* Field("city", city, Field.Store.*YES*, Field.Index.*UN_**
: > > TOKENIZED*));
: > >
: > > }
: > >
: > > *if*(state != *null*) {
: > >
: > > *doc*.add(*new* Field("*state*", state, Field.Store.*YES*, Field.Index.*
: > > TOKENIZED*));
: > >
: > > }
: > >
: > > *Using this code I can search by any field with my app EXCEPT city,
: > though
: > > I
: > > see it in the index using Luke.  I also can't search for it using Luke.
: > > When
: > > I add Field.Index.TOKENIZED  to the city field, I can search by it
: > fine.*
: > >
: > > *Is this normal behavior? This doesn't make sense to me. Tokenized
: > should
: > > prevent me from searching unless I'm missing something. Any ideas?
: > > Thanks!*
: > >
: > > *B*
: > >
: > >
: >
: >
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message