Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 88440 invoked from network); 5 Nov 2006 03:00:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Nov 2006 03:00:44 -0000 Received: (qmail 26188 invoked by uid 500); 5 Nov 2006 03:00:49 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 26157 invoked by uid 500); 5 Nov 2006 03:00:49 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 26146 invoked by uid 99); 5 Nov 2006 03:00:49 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Nov 2006 19:00:49 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of erickerickson@gmail.com designates 64.233.182.189 as permitted sender) Received: from [64.233.182.189] (HELO nf-out-0910.google.com) (64.233.182.189) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Nov 2006 19:00:37 -0800 Received: by nf-out-0910.google.com with SMTP id n29so708336nfc for ; Sat, 04 Nov 2006 19:00:16 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=rveCnSw30yP5PM7izpfHurk2w9lBG7XMcuOqa/y8wi/S0RRx9CkShO0dgKexFhlIUR2zRzMvNkfLSICHNb+2ewJpQhHKPOWpKjrqjgZryTlPIgCixApHDJLVqxm6IgAkNTRFnAviW7IC5RzQUUQEjd6K5zj6CQzuvxzMQVXqfdg= Received: by 10.82.123.16 with SMTP id v16mr1074533buc.1162695615430; Sat, 04 Nov 2006 19:00:15 -0800 (PST) Received: by 10.82.182.20 with HTTP; Sat, 4 Nov 2006 19:00:15 -0800 (PST) Message-ID: <359a92830611041900u502378c0md801397394cbc203@mail.gmail.com> Date: Sat, 4 Nov 2006 22:00:15 -0500 From: "Erick Erickson" To: java-user@lucene.apache.org Subject: Re: 2.0 and Tokenized versus UN_TOKENIZED In-Reply-To: <386a4720611041419m2b029019ke93d57857fe68eb7@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_34323_5257502.1162695615355" References: <386a4720611041419m2b029019ke93d57857fe68eb7@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_34323_5257502.1162695615355 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Two questions come to mind... 1> what analyzer are you using for the *query*? Is it possible that when you query for city you're using a tokenizer that breaks up your city code? 2> what about case? I'll assume that you have tried to search one-word cities, so how the stream is tokenized won't break the query places you don't expect. But if you index Austin UN_TOKENZED, then search for it using, say StandardAnalyzer, it'll look for austin and they won't match. This may apply to Luke too. In Luke, you can choose a different analyzer (WhitespaceAnalyzer comes to mind). Hope this helps Erick On 11/4/06, James Rhodes wrote: > > I'm using the 2.0 branch and I've had issues with searching indexes where > the fields aren't tokenized. > For instance, my index consists of count,lastname,city,state and I used > the > following code to index it (the data is in a sql server db): > * > > if*(count != 0) { > > doc.add(*new* Field("count", NumberUtils.*pad*(count), > Field.*Store*.*YES*, > Field.Index.*TOKENIZED*)); > > } > > *if*(lastName != *null*) { > > doc.add(*new* Field("lastname", lastName, Field.Store.*YES*, Field.Index.* > TOKENIZED*,Field.TermVector.*YES*)); > > } > > *if*(city != *null*) { > > doc.add(*new* Field("city", city, Field.Store.*YES*, Field.Index.*UN_** > TOKENIZED*)); > > } > > *if*(state != *null*) { > > *doc*.add(*new* Field("*state*", state, Field.Store.*YES*, Field.Index.* > TOKENIZED*)); > > } > > *Using this code I can search by any field with my app EXCEPT city, though > I > see it in the index using Luke. I also can't search for it using Luke. > When > I add Field.Index.TOKENIZED to the city field, I can search by it fine.* > > *Is this normal behavior? This doesn't make sense to me. Tokenized should > prevent me from searching unless I'm missing something. Any ideas? > Thanks!* > > *B* > > ------=_Part_34323_5257502.1162695615355--