Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 25708 invoked from network); 25 Apr 2006 23:03:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 25 Apr 2006 23:03:07 -0000 Received: (qmail 76977 invoked by uid 500); 25 Apr 2006 23:02:58 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 76946 invoked by uid 500); 25 Apr 2006 23:02:58 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 76934 invoked by uid 99); 25 Apr 2006 23:02:58 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Apr 2006 16:02:58 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [169.229.70.167] (HELO rescomp.berkeley.edu) (169.229.70.167) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Apr 2006 16:02:56 -0700 Received: by rescomp.berkeley.edu (Postfix, from userid 1007) id 7BF1B5B77A; Tue, 25 Apr 2006 16:02:32 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by rescomp.berkeley.edu (Postfix) with ESMTP id 726AD7F403 for ; Tue, 25 Apr 2006 16:02:32 -0700 (PDT) Date: Tue, 25 Apr 2006 16:02:32 -0700 (PDT) From: Chris Hostetter To: java-user@lucene.apache.org Subject: Re: Alphanumeric model ids In-Reply-To: <373A3DE3-8DBA-4EF2-AFC9-906AF8C04BA0@mac.com> Message-ID: References: <373A3DE3-8DBA-4EF2-AFC9-906AF8C04BA0@mac.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N I bet that if you look at the toString() of the query you get back from your query parser, you'll see that the non numeric part numbers have been stemmed. You took the right steps when you indexed the field as UN_TOKENIZED, but at query time your query parser doesn't know about that -- take a look at the PerFieldAnalyzerWrapper and the KeywordAnalyzer as a way to make sure your query parser doesn't do any processing on the terms you search for in non-tockenized fields. Or ... prepare for shameless plug ... check out the Solr project. Solr adds a very flexible schema layer on top of lucene, that lets you specify field types, and map fields (explicitly named or dynamicaly created based on field name patters) to field types -- every field type can have a differnet analyzer, two acctually: one used when indexing and one used when quering... http://incubator.apache.org/solr/ : Date: Tue, 25 Apr 2006 16:53:24 -0600 : From: Jeremy Hanna : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: Alphanumeric model ids : : I am trying to search by a number of fields including an alphanumeric : model id. : : This is just the model id that comes from manufacturers. I've tried : to use a StandardAnalyzer and a SnowballAnalyzer to index the data. : Then I search with the associated analyzer using a : MultiFieldQueryParser. Going through the debug into the attached : Lucene source, I see that all a MultiFieldQueryParser does is make a : bunch of queries and link them together with a Boolean query with : SHOULD values. I see that it is getting the right field, "model", : and has the right query in there, e.g. "XPHP", but it returns no : results. : : When I index it, I do the following: : modelField = new Field("model", (product.getModelNumber() == null) ? : "" : product.getModelNumber(), Field.Store.NO, : Field.Index.UN_TOKENIZED); : ... : document.add(modelField); : ... : indexWriter.addDocument(document); : : So it shouldn't be messing with the model id retrieved from the : database when it puts it in the index (UN_TOKENIZED). : : The weird thing is that it finds those model ids that are only : numeric (including punctuation, e.g. "40603-38"). But it cannot find : the "XPHP" model id. On the command line SQL interface, I can do a : select * from product where model = 'XPHP'; and it comes back with : the single result. : : Anyone have any idea as to why the numeric ones would come up and the : alphanumeric ones would not find the right values in the index? : : Thanks much, : Jeremy : : --------------------------------------------------------------------- : To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org : For additional commands, e-mail: java-user-help@lucene.apache.org : -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org