lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <jeremy_ha...@mac.com>
Subject Re: Alphanumeric model ids
Date Tue, 25 Apr 2006 23:59:53 GMT
Thanks Chris, it works like a champ now.  I had thought I looked at  
the queries themselves with toString but in any case, the queries  
actually work now.  I didn't realize that Lucene was customizable on  
so many levels - when you create the analyzer, when you create the  
index, when you perform each query.  Kinda cool.

On Apr 25, 2006, at 5:02 PM, Chris Hostetter wrote:

>
> I bet that if you look at the toString() of the query you get back  
> from
> your query parser, you'll see that the non numeric part numbers  
> have been
> stemmed.
>
> You took the right steps when you indexed the field as  
> UN_TOKENIZED, but
> at query time your query parser doesn't know about that -- take a  
> look at
> the PerFieldAnalyzerWrapper and the KeywordAnalyzer as a way to  
> make sure
> your query parser doesn't do any processing on the terms you search  
> for in
> non-tockenized fields.
>
> Or ... prepare for shameless plug ... check out the Solr project.   
> Solr
> adds a very flexible schema layer on top of lucene, that lets you  
> specify
> field types, and map fields (explicitly named or dynamicaly created  
> based
> on field name patters) to field types -- every field type can have a
> differnet analyzer, two acctually: one used when indexing and one used
> when quering...
> 	http://incubator.apache.org/solr/
>
>
> : Date: Tue, 25 Apr 2006 16:53:24 -0600
> : From: Jeremy Hanna <jeremy_hanna@mac.com>
> : Reply-To: java-user@lucene.apache.org
> : To: java-user@lucene.apache.org
> : Subject: Alphanumeric model ids
> :
> : I am trying to search by a number of fields including an  
> alphanumeric
> : model id.
> :
> : This is just the model id that comes from manufacturers.  I've tried
> : to use a StandardAnalyzer and a SnowballAnalyzer to index the data.
> : Then I search with the associated analyzer using a
> : MultiFieldQueryParser.  Going through the debug into the attached
> : Lucene source, I see that all a MultiFieldQueryParser does is make a
> : bunch of queries and link them together with a Boolean query with
> : SHOULD values.  I see that it is getting the right field, "model",
> : and has the right query in there, e.g. "XPHP", but it returns no
> : results.
> :
> : When I index it, I do the following:
> : modelField = new Field("model", (product.getModelNumber() == null) ?
> : "" : product.getModelNumber(), Field.Store.NO,
> : Field.Index.UN_TOKENIZED);
> : ...
> : document.add(modelField);
> : ...
> : indexWriter.addDocument(document);
> :
> : So it shouldn't be messing with the model id retrieved from the
> : database when it puts it in the index (UN_TOKENIZED).
> :
> : The weird thing is that it finds those model ids that are only
> : numeric (including punctuation, e.g. "40603-38").  But it cannot  
> find
> : the "XPHP" model id.  On the command line SQL interface, I can do a
> : select * from product where model = 'XPHP'; and it comes back with
> : the single result.
> :
> : Anyone have any idea as to why the numeric ones would come up and  
> the
> : alphanumeric ones would not find the right values in the index?
> :
> : Thanks much,
> : Jeremy
> :
> :  
> ---------------------------------------------------------------------
> : To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> : For additional commands, e-mail: java-user-help@lucene.apache.org
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message