Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 77467 invoked from network); 30 Apr 2003 13:35:26 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 30 Apr 2003 13:35:26 -0000 Received: (qmail 26325 invoked by uid 97); 30 Apr 2003 13:37:26 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@nagoya.betaversion.org Received: (qmail 26317 invoked from network); 30 Apr 2003 13:37:26 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 30 Apr 2003 13:37:26 -0000 Received: (qmail 76820 invoked by uid 500); 30 Apr 2003 13:35:18 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 76695 invoked from network); 30 Apr 2003 13:35:16 -0000 Received: from merc62.na.sas.com (149.173.6.49) by daedalus.apache.org with SMTP; 30 Apr 2003 13:35:16 -0000 Received: from merc12.na.sas.com ([10.19.11.9]) by 10.19.11.46 with InterScan Messaging Security Suite; Wed, 30 Apr 2003 09:35:16 -0400 X-MimeOLE: Produced By Microsoft Exchange V6.0.6410.0 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: Analyzer use at search time? Date: Wed, 30 Apr 2003 09:35:16 -0400 Message-ID: <187D6D956106D84E9D8B280F6458FE140F5BC4@merc12.na.sas.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Analyzer use at search time? Thread-Index: AcMOmU/NulNmzO5rTSiKj1GatBroIwAXw8XwAAjMaGA= From: "Eric Isakson" To: "Lucene Users List" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N When looking into something similar to this in the past, I noticed that = when an anlalyzer returns multiple tokens the query parser treated them = like a phrase, so when your document was indexed with the word: foo Then your search analyzer turns foo into the tokens: foo bar Your query object will be looking for the phrase "foo bar" and your = document only has the token foo so you get no hits. I noticed this when I was running the unit tests against a modified = version of the query parser. I suspect this is what is causing your trouble, though I don't know how = to "fix" it, you might consider taking the query parser code as a = baseline and roll your own that behaves a little differently. Here is the snip of code from QueryParser that does this: protected Query = org.apache.lucene.queryParser.QueryParser.getFieldQuery(String field, Analyzer analyzer, String queryText) { // Use the analyzer to get all the tokens, and then build a = TermQuery, // PhraseQuery, or nothing based on the term count TokenStream source =3D analyzer.tokenStream(field, new = StringReader(queryText)); Vector v =3D new Vector(); org.apache.lucene.analysis.Token t; while (true) { try { t =3D source.next(); } catch (IOException e) { t =3D null; } if (t =3D=3D null) break; v.addElement(t.termText()); } if (v.size() =3D=3D 0) return null; else if (v.size() =3D=3D 1) return new TermQuery(new Term(field, (String) v.elementAt(0))); else { PhraseQuery q =3D new PhraseQuery(); q.setSlop(phraseSlop); for (int i=3D0; i